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Attorney Docket: M6308-3 EMY/ OAPT f646-421 

CYTOCHROME P450 MONOOXYGENASE AND 
NADPH CYTOCHROME P450 OXIDOREDUCTASE GENES AND 
PROTEINS RELATED TO THE OMEGA HYDROXYLASE COMPLEX OF 
rAlsmmA TROPICA) TS AND METH nnS RELATING THERETO 

mOSS REFERENCE TO RE LATED APPLICATIONS 

This application claims priority to U.S. Provisional Application Serial No. 
60/103,099 filed October 5, 1998, and U.S. Provisional Application Serial No. 60/083,798 filed 
May 1, 1998. 

BACKGROUND 

1. Field of the Invention 

The present invention relates to novel genes which encode enzymes of the o>- 
hydroxylase complex in yeast Candida tropicalis strains. In particular, the invention relates to 
novel genes encoding the cytochrome P450 and NADPH reductase enzymes of the <o- 
hydroxylase complex in yeast Candida tropicalis, and to a method of quantitating the expression 
of genes. 

2. Description nf the Related Art 

Aliphatic dioic acids are versatile chemical intermediates useful as raw 
materials for the preparation of perfumes, polymers, adhesives and macrolid antibiotics. While 
several chemical routes to the synthesis of long-chain alpha, u-dicarboxylic acids are available, 
the synthesis is not easy and most methods result in mixtures containing shorter chain lengths. 
As a result, extensive purification steps are necessary. While it is known that long-chain dioic 
acids can also be produced by microbial transformation of alkanes, fatty acids or esters thereof, 
chemical synthesis has remained the most commercially viable route, due to limitations with the 
current biological approaches. 

Several strains of yeast are known to excrete alpha, w-dicarboxylic acids as a 
byproduct when cultured on alkanes or fatty acids as the carbon source. In particular, yeast 
belonging to the Genus Candida, such as C. albicans. C. cloacae, C. guillermondii, C. 
intermedia, C. lipolytica, C. maltosa, C. parapsilosis and C. zeylenoides are known to produce 



such dicarboxylic acids (Agr. Biol. Chern 35: 2033-2042 (1971)). Also, various strains of C. 
tropicalis are known to produce dicarboxylic acids ranging in chain lengths from C„ through C u 
(Okino et al., BM Lawrence, BD Mookherjee and BJ Willis (eds), in Flavors and Fragrances: A 
World Perspective. Proceedings of the 10 th International Conference of Essential Oils, Flavors 
5 and Fragrances, Elsevier Science Publishers BV Amsterdam (1988)), and are the basis of several 
patents as reviewed by BOhler and Schindler, in Aliphatic Hydrocarbons in Biotechnology, H. J. 
Rehm and G. Reed (eds), Vol. 169, Verlag Chemie, Weinheim (1984). 

Studies of the biochemical processes by which yeasts metabolize alkanes and fatty 
acids have revealed three types of oxidation reactions: a-oxidation of alkanes to alcohols, g>- 

10 oxidation of fatty acids to alpha, o>-dicarboxylic acids and the degradative /^oxidation of fatty 
acids to C0 2 and water. The first two types of oxidations are catalyzed by microsomal enzymes 
while the last type takes place in the peroxisomes. In G tropicalis, the first step in the co- 
oxidation pathway is catalyzed by a membrane-bound enzyme complex (o)-hydroxylase 
complex) including a cytochrome P450 monooxygenase and a NADPH cytochrome reductase. 

15 This hydroxylase complex is responsible for the primary oxidation of the terminal methyl group 
in alkanes and fatty acids (Gilewicz et al., Can, J. Microbiol. 25:201 (1979)). The genes which 
encode the cytochrome P450 and NADPH reductase components of the complex have previously 
been identified as P450ALK and P450RED respectively, and have also been cloned and 
sequenced (Sanglard et al., Gene 76:121-136 (1989)). P450ALK has also been designated 

20 P450ALK1 . More recently, ALK genes have been designated by the symbol CYP and RED 

genes have been designated by the symbol CPR. See, e.g., Nelson, Pharmacogenetics 6(1): 1-42 
(1996), which is incorporated herein by reference. See also Ohkuma et al., DNA and Cell 
Biology 14:163-173 (1995), Seghezzi et al., DNA and Cell Biology, 1 1 :767-780 (1992) and 
Kargel et al., Yeast 12:333-348 (1996), each incorporated herein by reference. For example, 

25 P450ALK is also designated CYP52 according to the nomenclature of Nelson, supra. Fatty acids 
are ultimately formed from alkanes after two additional oxidation steps, catalyzed by alcohol 
oxidase (Kemp et al., Appl Microbiol and BiotechnoL 28: 370-374 (1988)) and aldehyde 
dehydrogenase. The fatty acids can be further oxidized through the same or similar pathway to 
the corresponding dicarboxylic acid. The G>-oxidation of fatty acids proceeds via the (o-hydroxy 

30 fatty acid and its aldehyde derivative, to the corresponding dicarboxylic acid without the 
requirement for CoA activation. However, both fatty acids and dicarboxylic acids can be 
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degraded, after activation to the corresponding acyl-CoA ester through the P-oxidation pathway 
in the peroxisomes, leading to chain shortening. In mammalian systems, both fatty acid and 
dicarboxylic acid products of co-oxidation are activated to their CoA-esters at equal rates and are 
substrates for both mitochondrial and peroxisomal P-oxidation (J. Biochem., 102:225-234 
5 (1987)). In yeast, P-oxidation takes place solely in the peroxisomes (Agr.Biol.Chem. 49:1821- 
1828(1985)). 

The production of dicarboxylic acids by fermentation of unsaturated C H -C 16 
monocarboxylic acids using a strain of the species C. tropicalis is disclosed in U.S. Patent 
4,474,882. The unsaturated dicarboxylic acids correspond to the starting materials in the number 
10 and position of the double bonds. Similar processes in which other special microorganisms are 
used are described in U.S. Patents 3,975,234 and 4,339,536, in British Patent Specification 
1,405,026 and in German Patent Publications 21 64 626, 28 53 847, 29 37 292, 29 51 177, and 
21 40 133. 

Cytochromes P450 (P450s) are terminal monooxidases of a 
15 multicomponent enzyme system as described above. They comprise a superfamily of proteins 
which exist widely in nature having been isolated from a variety of organisms as described e.g., 
in Nelson, supra. These organisms include various mammals, fish, invertebrates, plants, 
mollusk, crustaceans, lower eukaiyotes and bacteria (Nelson, supra). First discovered in rodent 
liver microsomes as a carbon-monoxide binding pigment as described, e.g., in Garfinkel, Arch 
20 Biochem. Biophys. 77:493-509 (1958), which is incorporated herein by reference, P450s were 
later named based on their 

absorption at 450 nm in a reduced-CO coupled difference spectrum as described, e.g., in Omura 
et al., J. Biol Chem. 239:2370-2378 (1964), which is incorporated herein by reference. 

P450s catalyze the metabolism of a variety of endogenous and exogenous 

25 compounds (Nelson, supra). Endogenous compounds include steroids, prostanoids, eicosanoids, 
fat-soluble vitamins, fatty acids, mammalian alkaloids, leukotrines, biogenic amines and 
phytolexins (Nelson, supra). P450 metabolism involves such reactions as epoxidation, 
hydroxylation, deakylation, N-hydroxylation, sulfoxidation, desulfuration and reductive 
dehalogenation. These reactions generally make the compound more water soluble, which is 

30 conducive for excretion, and more electrophilic. These electrophilic products can have 

detrimental effects if they react with DNA or other cellular constituents. However, they can react 
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through conjugation with low molecular weight hydrophilic substances resulting in 
glucoronidation, sulfation, acetylation, amino acid conjugation or glutathione conjugation 
typically leading to inactivation and elimination as described, e.g., in Klaassen et aL, Toxicology, 
3 rd ed, Macmillan, New York, 1986, incorporated herein by reference. 
5 P450s are heme thiolate proteins consisting of a heme moiety bound to a single 

polypeptide chain of 45,000 to 55,000 Da. The iron of the heme prosthetic group is located at 
the center of a protoporphyrin ring. Four ligands of the heme iron can be attributed to the 
porphyrin ring. The fifth ligand is a thiolate anion from a cysteinyl residue of the polypeptide. 
The sixth ligand is probably a hydroxyl group from an amino acid residue, or a moiety with a 

10 similar field strength such as a water molecule as described, e.g., in Goeptar et al., Critical 
Reviews in Toxicology 25(l):25-65 (1995), incorporated herein by reference. 

Monooxygenation reactions catalyzed by cytochromes P450 in a eukaryotic 
membrane-bound system require the transfer of electrons from NADPH to P450 via NADPH- 
cytochrome P450 reductase (CPR) as described, e.g., in Taniguchi et al., Arch Biochem, 

15 Biophys. 232:585 (1984), incorporated herein by reference. CPR is a flavoprotein of 

approximately 78,000 Da containing 1 mol of flavin adenine dinucleotide (FAD) and 1 mol of 
flavin mononucleotide (FMN) per mole of enzyme as described, e.g., in Potter et al., J. Biol. 
Chem. 258:6906 (1983), incorporated herein by reference. The FAD moiety of CPR is the site of 
electron entry into the enzyme, whereas FMN is the electron-donating site to P450 as described, 

20 e.g., in Vermilion et al., J. Biol Chem. 253:8812 (1978), incorporated herein by reference. The 
overall reaction is as follows: 

H + + RH + NADPH + 0 2 - ROH + NADP + + H 2 0 

25 Binding of a substrate to the catalytic site of P450 apparently results in a 

conformational change initiating electron transfer from CPR to P450. Subsequent to the transfer 
of the first electron, 0 2 binds to the Fe 2 + -P450 substrate complex to form Fe 3 + -P450-substrate 
- complex. This complex is then reduced by a second electron from CPR 9 or, in some cases, 

NADH via cytochrome b5 and NXDH-cytochroriie b5 reductase as described, e.g., in Guengerich 

30 et al., Arch Biochem. Biophys. 205:365 (1980), incorporated herein by reference. One atom of 
this reactive oxygen is introduced into the substrate, while the other is reduced to water. The 
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sequence as set forth in SEQ ID NO- 83 i.au^ A ■ ™ 

ViuTO. S3. A method of producing a CPKA protein includins an 

am.no ac,d seouence as set forth in SEQ ID NO: 83 is also provided which inCudes a) 

the expression of the protein, -vunng 

An isolated nucleic acid is provided which encodes a Cm protein having the 
a-oactdseouencesetforthinSEQIDNO^^ An isolated nucleic acid is provided which 

10 -l^»cod ta gr.giondennedbynueleoddes 1 033.30«Oa s «.ford lb SEQIDNO-82 An 
.seated protein is provided which includes an ami™ acid seouence as setforth in SEQ ,D NO- 
84 Avectorrs provided which includes a nucleotide seouence eroding CPRB protein ' 
■nclud.nga^aminoacidsequenceasse.fort.inSEQ.DNO^^ Ah.stce.lis provided which 
,s transfected or reformed with the nucleic acid encoding Cm protein having an mino J 

.5 seouenceassetforthinSEQlDKO.g, AmemodofproducingaCmproteinincludi I 

^^^^^^^^^ ^ 

' thea™- .^'^"^"^"^^^^m^^proteinhavir^ 
uteannnoactdscouencesetforthinSEQIDNO^S. An isolated nudeic acid is provided 
whtch UKludes a ceding region defined by nucleotides 1 1 77-2748 as se, forth in SEQ ID NO- 85 
An .so ated protein is provided which includes an amino acid se,uer*e as se, forth in SEQ ID ' 
NO 5. ^.^^^^^^ 

5 -,„dmg m ammoaeid„asse,forthinSEQ 1 DNO:0 5 . A host cell is provided which 

«- ^nce as se, forth „ SEQ ,D NO: A meted of producing a cmuu plein 
. ^udmg an ammo acid sequence as * forth in SEQ ,D NO: 95 is provided which incudes a) 

the expression of the protein. 



An isolated nucleic acid encoding a CYP52A2A protein is provided which has the 
amino acid sequence set forth in SEQ ID NO: 96. An isolated nucleic acid is provided which 
includes a coding region defined by nucleotides 1 199-2767 as set forth in SEQ ID NO: 86. An 
isolated protein is provided which includes an amino acid sequence as set forth in SEQ ID NO: 

5 96 A vector is provided which includes a nucleotide sequence encoding CYP52A2A protein 
including an amino acid sequence as set forth in SEQ ID NO: 96. A host cell is provided which 
is transfected or transformed with the nucleic acid encoding CYP52A2A protein having an amino 
acid sequence as set forth in SEQ ID NO: 96. A method of producing a CYP52A2A protein 
including an amino acid sequence as set forth in SEQ ID NO: 96 is provided which includes a) 

10 transforming a suitable host cell with a DNA sequence that encodes the protein having the amino 
acid sequence as set forth in SEQ ID NO: 96; and b) culturing the cell under conditions favoring 

the expression of the protein. 

An isolated nucleic acid encoding a CYP52A2B protein is provided which has the 
amino acid sequence set forth in SEQ ID NO: 97. An isolated nucleic acid is provided which 
15 includes a coding region defined by nucleotides 1072-2640 as set forth in SEQ ID NO: 87. An 
isolated protein is provided which includes an amino acid sequence as set forth in SEQ ID NO: 
97. A vector is provided which includes a nucleotide sequence encoding CYP52A2B protein 
including an amino acid sequence as set forth in SEQ ID NO: 97. A host cell is provided which 
is transfected or transformed with the nucleic acid encoding CYP52A2B protein having an amino 
20 acid sequence as set forth in SEQ ID NO: 97. A method of producing a CYP52A2B protein 
including an amino acid sequence as set forth in SEQ ID NO: 97 is provided which includes a) 
transforming a suitable host cell with a DNA sequence that encodes the protein having the amino 
acid sequence as set forth in SEQ ID NO: 97; and b) culturing the cell under conditions favoring 

the expression of the protein. 
25 An isolated nucleic acid encoding a CYP52A3A protein is provided which has 

the amino acid sequence set forth in SEQ ID NO: 98. An isolated nucleic acid is provided 
which includes a coding region defined by nucleotides 1 126-2748 as set forth in SEQ ID NO: 88. 
An isolated protein is provided which includes an amino acid sequence as set forth in SEQ ID 
, : ^0- 98 A vector is ^ sequence encoding CYP52A3A 

30 protein including an amino acid sequence as set forth in SEQ ID NO: 98. A host cell is provided 
which is transfected or transformed with the nucleic acid encoding CYP52A3A protein having an 
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amino acid sequence as set foith in SEQ ID NO : 98. A method of producing , CWS2AU 
proteu, tucluding an amino acid sequence as set forth in SEQ ID NO: 98 is provided which 
■ncludes a) tnusforrnmg a suitable host cel. with a DNA sequence that enccOes the protein 
havng the amino acid sequence as se, forth in SEQ ID NO: 98; and b) cunning the cell under 
conditions favoring the expression of the protein. 

An isolated nucleic acid encoding a CYPS2A3B protein is provided having the 
antacid sequence as saforfctaSEQ ID NO: 99. An iso,ated nucleic acid is provided which 
utcludes a coding region defined by nucleotides 913-2535 as set ford, in SEQ ID NO- 89 An 
isolated protein is provided which indudes an amino acid sequence as se, forth in SEQ ID NO- 
99. ^^^^^^ sc ^ e ^ CYpsmB ■ 

mcludmg an amino acid sequence as set forth in SEQ ID NO: 99. A host ceU is provided which 

acd sequence as se, forth in SEQ ID NO: 99. A meted of producing a CYP52A3B pro,ei» 
.»cWmga„ami„oacidse,ue„ceasse,forthinSEQ,DNO:99is provided which includes a) 

acd sequence as se, forth in SEQ ID NO: 99; and b) cuhuring tire cell under conditions ftvoring 

the expression of the protein. 

An isolated nucleic acid encoding a CYP52A5A protein is provided having the 
antacids equence set forth in SEQ ID NO: 100. An isolated nucleic acid is provided which 
20 ^-codrngregionde^ ^ 

isolatedprotein is provided which includes an amino acid sequence as set forth in SEQ ID NO- 
10 Avector is provided which includes a nucleotide sequence encoding OT 5,^ protein ' 
including an amino acid sequence as set forth in SEQ ID NO: 100. A host cell is provided 
^-transfectedor^ 

amino acd sequence as set forth in SEQ ID NO: 100. A method of producing a CYP52A5A 
protein including an amino acid sequence as set forth in SEQ ID NO: 100 is provided which 
includes a) transforming a suitable host cell with a DNA sequence that encodes the protein 
having the amino acid sequence as set forth in SEQ ID NO: 100; and b) culturing the cell under 
conditions favoring the i expression" of the prolem; ;s 
30 An isolated nucleic acid encoding a CYP52A5B protein is provided having the 

ammo acid sequence as set forth in SEQ ID NO: 101. An isolated nucleic acid is provided 
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which includes a coding region defined by nucleotides 1 142-2695 as set forth in SEQ ID NO: 91 . 
An isolated protein is provided which includes an amino acid sequence as set forth in SEQ ID 
NO: 101. A vector is provided which includes a nucleotide sequence encoding CYP52A5B 
protein including the amino acid sequence as set forth in SEQ ID NO: 101. A host cell is 
provided which is transfected or transformed with the nucleic acid encoding CYP52A5B protein 
having the amino acid sequence as set forth in SEQ ID NO: 101. A method of producing a 
CYP52A5B protein including an amino acid sequence as set forth in SEQ ID NO: 101 is 
provided which includes a) transforming a suitable host cell with a DNA sequence that encodes 
the protein having the amino acid sequence as set forth in SEQ ID NO: 101; and b) culturing the 
cell under conditions favoring the expression of the protein. 

An isolated nucleic acid encoding a CYP52A8A protein is provided having the 
amino acid sequence set forth in SEQ ID NO: 102. An isolated nucleic acid is provided which 
includes a coding region defined by nucleotides 464-2002 as set forth in SEQ ID NO: 92. An 
isolated protein is provided which includes an amino acid sequence as set forth in SEQ ID NO: 
102. A vector is provided which includes a nucleotide sequence encoding CYP52A8A protein 
including an amino acid sequence as set forth in SEQ ID NO: 102. A host cell is provided 
which is transfected or transformed with the nucleic acid encoding CYP52A8A protein having an 
amino acid sequence as set forth in SEQ ID NO: 102. A method of producing a CYP52A8A 
protein including an amino acid sequence as set forth in SEQ ID NO: 102 is provided which 
includes a) transforming a suitable host cell with a DNA sequence that encodes the protein 
having the amino acid sequence as set forth in SEQ ID NO: 102; and b) culturing the cell under 
conditions favoring the expression of the protein. 

An isolated nucleic acid encoding a CYP52A8B protein is provided having the 
amino acid sequence set forth in SEQ ID NO: 103. An isolated nucleic acid is provided which 
includes a coding region defined by nucleotides 1017-2555 as set forth in SEQ ID NO: 93. An 
isolated protein is provided which includes an amino acid sequence as set forth in SEQ ID NO: 
103. A vector is provided which includes a nucleotide sequence encoding CYP52A8B protein 
including an amino acid sequence as set forth in SEQ ID NO: 103. A host cell is provided 
. : which is transfected or transformed with the nucleic acid encoding CYP52A8B protein having an 
) amino acid sequence as set forth in SEQ ID NO: 103. A method of producing a CYP52A8B 
protein including an amino acid sequence as set forth in SEQ ID NO: 103 is provided which 
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^WomftvoriBglheexpressionoftbep^ 8 the cell under 

5 -»_«d^«.«« hgBQII)N0! |(M . An isolated nucleic acid is provideT 

^p ro ^isp ro Wded*chinclude S a„an U n„a cid ^ raMasMf0AfasEQro ^ 
_l« • Avector* ^^^ mMmqmmlbtCm2D ^ ■ 

an.n.oacdscuenceasscf^n.SEQ.DNO: ,04. A method of producing a CKPJ2£>4^ 
pro eu.n.c.udn.ganannnoacidseouenceassetforn.inSEQmNO: 104is provided which 

^tearninoacidseouenceassetforu.inSEQmNO: ,04; and b)culturi„g the cell under 
15 conditions favoring the expression of the protein. blunder 

of targe, r^l^ ******* °" •« *»* by quantifying u,e amount 

"rge, gene, b) culturntg the orgarisn, with an organic substrate which causes upregulation in the 

-petitor RNA to form an RNA mi*ure, wherein „. ^ „ ^ ~j 

arge, DNA and competitor DNA; » conducting a polymer chain reaction in the presence of 

M> ustng tncreastng amounts rf fc ^ ^ m 

amount of target RNA; h) determining the point at which the amount of ^ DNA f 

-to of the concentration of unknown target to the known „ti„„ of ^ * 
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obtaining a sample of total RNA from the organism at another point in time and repeating steps 
(d-i). 

A method for increasing production of a dicarboxylic acid is provided which 

includes a) providing a host cell having a naturally occurring number of CPRA genes; b) 

5 increasing, in the host cell, the number of CPRA genes which encode a CPRA protein having the 

amino acid sequence as set forth in SEQ ID NO: 83; c) culturing the host cell in media 

containing an organic substrate which upregulates the CPRA gene, to effect increased production 

of dicarboxylic acid. 

A method for increasing the production of a CPRA protein having an amino acid 

10 sequence as set forth in SEQ ID NO: 83 is provided which includes a) transforming a host cell 
having a naturally occurring amount of CPRA protein with an increased copy number of a CPRA 
gene that encodes the CPRA protein having the amino acid sequence as set forth in SEQ ID NO: 
83; and b) culturing the cell and thereby increasing expression of the protein compared with that 
of a host cell containing a naturally occurring copy number of the CPRA gene. 

15 A method for increasing production of a dicarboxylic acid is provided which 

includes a) providing a host cell having a naturally occurring number of CPRB genes; b) 
increasing, in the host cell, the number of CPRB genes which encode a CPRB protein having the 
amino acid sequence as set forth in SEQ ID NO: 84; c) culturing the host ceU in media 
containing an organic substrate which upregulates the CPRB gene, to effect increased production 

20 of dicarboxylic acid. 

A method for increasing the production of a CPRB protein having an amino acid 
sequence as set forth in SEQ ID NO: 84 is provided which includes a) transforming a host cell 
having a naturally occurring amount of CPRB protein with an increased copy number of a CPRB 
gene that encodes the CPRB protein having the amino acid sequence as set forth in SEQ ID NO: 
25 84; and b) culturing the cell and thereby increasing expression of the protein compared with that 
of a host cell containing a naturally occurring copy number of the CPRB gene. 

A method for increasing production of a dicarboxylic acid is provided which 
includes a) providing a host cell having a naturally occurring number of CYP52A1 A genes; b) 
.increasing, m me host ceU, me number of CYP52A1 A genes which encode a CYP52A1A protein 
30 having the amino acid sequence as set forth in SEQ ID NO: 95; c) culturing the host cell in media 
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.n organic subs..,. which ^ fc 
production of dicarboxylic acid. 'creased 
A method for increasing the production of a CYP52A1A protein havino » • 

forth in SEQ ID NO: 95; and b) culturing the cell and thereby increasing expression ofthe 
10 A rnold for incr^gprcducnon of . dicarboxyIic ^ „ 

ha^ Ac ammo acid se,ue„ce as se, forth in SEQ ID N0: * c) ^ ^ hoa J£L 

15 production of dicarboxylic acid. 

^ A f <* *• Production of a CYP52A2A protein having an amino 

^uenceasse, fon ninSEQmNO^i sprovided ^ chMudBa)Wo J 

aC ™^^»c 0 de SB1 eCm^p r o,ei„ havinglheaminoaci 

pro«» ^npared with tha, of a host cei, containing . ^ ^ 

^iro2A2A gene. 

A method for increasing production of a dicarboxylic acid is provided which 
^^-dingahostcenhavinga.^,^,,^^^^ * 
» in the host cel.. the number « genes ^ encode . ^ 

a™g the ammo acid seouence as se, forth in SEQ IB NO: „ : „ ^ me „ os , ^ ^ 

production of dicarboxylic acid. 

I ' " . . ' " ' 8U »'Wi^. W M proteta having „, ^ 

^seo.uenceassetford.inSEQ.ONO:,,, provided which includes a) Wonlg a hos 
ce.1 havmg a naturally occurring amoun, „f cm^flprotein with an increased copy number „f 
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a CYP52A2B gene that encodes the CYP52A2B protein having the amino acid sequence as set 
forth - in SEQ ID NO: 97; and b) culturing the cell and thereby increasing expression of the 
protein compared with that of a host cell containing a naturally occurring copy number of the 
CYP52A2B gene. 

A method for increasing production of a dicarboxylic acid is provided which 
includes a) providing a host cell having a naturally occurring number of CYP52A3A genes; b) 
increasing, in the host cell, the number of CYP52A3A genes which encode a CYP52A3A protein 
having the amino acid sequence as set forth in SEQ ID NO: 98; c) culturing the host cell in media 
containing an organic substrate which upregulates CYP52A3A gene, to effect increased 

10 production of dicarboxylic acid. 

A method for increasing the production of a CYP52A3A protein having an amino 
acid sequence as set forth in SEQ ID NO: 98 is provided which includes a) transforming a host 
cell having a naturally occurring amount of CYP52A3A protein with an increased copy number of 
a CYP52A3A gene that encodes the CYP52A3A protein having the amino acid sequence as set 
forth in SEQ ID NO: 98; and b) culturing the cell and thereby increasing expression of the 
protein compared with that of a host cell containing a naturally occurring copy number of the 
CYP52A3A gene. 

A method for increasing production of a dicarboxylic acid is provided which 
includes a) providing a host cell having a naturally occurring number of CYP52A3B genes; b) 
increasing, in the host cell,.the number of CYP52A3B genes which encode a CYP52A3B protein 
having the amino acid sequence as set forth in SEQ ID NO: 99; c) culturing the host cell in media 
containing an organic substrate which upregulates the CYP52A3B gene, to effect increased 

production of dicarboxylic acid. 

A method for increasing the production of a CYP52A3B protein having an amino 
acid sequence as set forth in SEQ ID NO: 99 is provided which includes a) transforming a host 
cell having a naturally occurring amount of CYP52A3B protein with an increased copy number of 
a CYP52A3B gene that encodes the CYP52A3B protein having the amino acid sequence as set 
forth in SEQ ID NO: 99; and b) culturing the cell and thereby increasing expression of the 
protein compared with that of a host cell containing a naturally occurring copy number of the 
30 CYP52A3B gene. 



20 



25 
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A method for increasing production of a dicarboxylic acid is provided which 
includes a) providing a host cell having a naturally occurring number of CYP52A5A genes; b) 
increasing, in the host cell, the number of CYP52A5A genes which encode a CYP52A5A protein 
having the amino acid sequence as set forth in SEQ ID NO: 100; c) culturing the host cell in 
5 media containing an organic substrate which upregulates the CYPS2A5A gene, to effect increased 
production of dicarboxylic acid. 

A method for increasing the production of a CYP52A5A protein having an amino 
acid sequence as set forth in SEQ ID NO: 100 is provided which includes a) transforming a host 
cell having a naturally occurring amount of CYP52A5A protein with an increased copy number of 

10 a CYP52A5A gene that encodes the CYP52A5A protein having the amino acid sequence as set 
forth in SEQ ID NO: 100; and b) culturing the cell and thereby increasing expression of the 
protein compared with that of a host cell containing a naturally occurring copy number of the 
CYP52A5A gene. ; 

A method for increasing production of a dicarboxylic acid is provided which 

15 includes a) providing a host cell having a naturally occurring number of CYP52A5B genes; b) 
increasing, in the host cell, the number of CYP52A5B genes which encode a CYP52A5B protein 
having the amino acid sequence as set forth in SEQ ID NO: 101 ; c) culturing the host cell in 
media containing an organic substrate which upregulates the CYP52A5B gene, to effect increased 
production of dicarboxylic acid. 

20 A method for increasing the production of a CYP52A5B protein having an amino 

acid sequence as set forth in SEQ ID NO: 101 is provided which includes a) transforming a host 
cell having a naturally occurring amount of CYP52A5B protein with an increased copy number of 
a CYP52A5B gene that encodes the CYP52A5B protein having the amino acid sequence as set 
forth in SEQ ID NO: 101; and b) culturing the cell and thereby increasing expression of the 

25 protein compared with that of a host cell containing a naturally occurring copy number of the 
CYP52A5B gene. 

A method for increasing production of a dicarboxylic acid is provided which 
includes a) providing a host cell having a naturally occurring number of CYP52A8A genes; b) 
increasing, in the hbst cell, the number of CYP52A8A genes which encode a CYP52A8A protein 
30 having the amino acid sequence as set forth in SEQ ID NO: 102; c) culturing the host cell in 
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media containing an organic substrate which upregulates the CYPS2A8A gene, to effect increased 
production of dicarboxylic acid. 

A method for increasing the production of a CYPS2A8A protein having an amino 
acid sequence as set forth in SEQ ID NO: 102 is provided which includes a) transforming a host 
cell having a naturally occurring amount of CYP52A8A protein with an increased copy number of 
a CYP52A8A gene that encodes the CYP52A8A protein having the amino acid sequence as set 
forth in SEQ ID NO: 102; and b) culturing the cell and thereby increasing expression of the 
protein compared with that of a host cell containing a naturally occurring copy number of the 
CYP52A8A gene. 

A method for increasing production of a dicarboxylic acid is provided which 
includes a) providing a host cell having a naturally occurring number of CYP52A8B genes; b) 
increasing, in the host cell, the number of CYP52A8B genes which encode a CYP52A8B protein 
having the amino acid sequence as set forth in SEQ ID NO: 103; c) culturing the host cell in 
media containing an organic substrate which upregulates the CYP52A8B gene, to effect increased 
production of dicarboxylic acid. 

A method for increasing the production of a CYP52A8B protein having an amino 
acid sequence as set forth in SEQ ID NO: 103 is provided which includes a) traiisforrning a host 
cell having a naturally occurring amount of CYP52A8B protein with an increased copy number of 
a CYP52A8B gene that encodes the CYP52A8B protein having the amino acid sequence as set 
forth in SEQ ID NO: 103; and b) culturing the cell and thereby increasing expression of the 
protein compared with that of a host cell containing a naturally occurring copy number of the 
CYP52A8B gene. 

A method for increasing production of a dicarboxylic acid is provided which 
includes a) providing a host cell having a naturally occurring number of CYP52D4A genes; b) 
increasing, in the host cell, the number of CYP52D4A genes which encode a CYP52D4A protein 
having the amino acid sequence as set forth in SEQ ID NO: 104; c) culturing the host cell in 
media containing an organic substrate which upregulates the CYP52D4A gene, to effect increased 
production of dicarboxylic acid. 

- : A method for iricreasihgthe prbductknrof a CYP52D4A protein having an amino 

acid sequence as set forth in SEQ ID NO: 104 is provided which includes a) transforming a host 
cell having a naturally occurring amount of CYP52D4A protein with an increased copy number 
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CYP52D4A gene. 

5 

BRIEF T)FSrPlPno N ng m np kxumn 
Figure 1 is a schematic representation of cloning vector pTriplEx from 

aontech^LaboratoneMnc. Selected restriction sites within me multiple cloning site are 
shown. 

10 2A is ■ map of the ZAP Express™ vector. 

Figure 2B is a schematic representation of cloning phagemid vector pBK-CMV 
Figure 3 is a double stranded DNA sequence of a portion of the 5 prime coding 
region of the CYP52A5A gene (SEQ ID NO: 36). 

Figure 4 is a diagrammatic representation of highly conserved regions of CYP and 
15 C^geneproteinsequence, Helix I represents the putative substrate binding site and HR2 
represents the heme binding region. The FMN, FAD and NADPH binding regions are indicated 
below the CPR gene. 

Figure 5 is a diagrammatic representation of the plasmid pHKMl containing the 
tnmcatedCP^genepresentinthepTriplEx vector. A detailed restriction map of only the 
sequencedregionisshownatthetop. The bar indicates the open reading frame. The direction of 
transcription is indicated by an arrow under the open reading frame. 

Figure 6 is a diagrammatic representation of the plasmid pHKM4 containing the 
truncated CP^ gene present in the pTriplEx vector. A detailed restnction map of only the 
sequenced region is sho^ at the top. Be bar indicates the open reading frame. The direction of 
25 transcription is indicated by an arrow under the open reading frame. 

Figure 7 is a diagrammatic representation of the plasmid pHKM9 containing the 

, ^.^^T^?^ rttteto P' Thebarindicatestheopenreadingframe The 
direction of transcription is indfcatedlyan^Wl^tie open reading frame. 

Figure 8 is a diagrammatic representation of the plasmid pHKMl 1 containing the 
C^gene( SE QIDNO:^^^ 



30 
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of only the sequenced region is shown at the top. The bar indicates the open reading frame. The 
direction of transcription is indicated by an arrow under the open reading frame. 

Figure 9 is a diagrammatic representation of the plasmid pHKM12 containing the 
CYP52A8A gene (SEQ ID NO: 92) present in the pBK-CMV vector. A detailed restriction map 
5 of only the sequenced region is shown at the top. The bar indicates the open reading frame. The 
direction of transcription is indicated by an arrow under the open reading frame. 

Figure 10 is a diagrammatic representation of the plasmid pHKM13 containing 
the CYP52D4A gene (SEQ ID NO: 94) present in the pBK-CMV vector. A detailed restriction 
map of only the sequenced region is shown at the top. The bar indicates the open reading frame. 
10 The direction of transcription is indicated by an arrow under the open reading frame. 

Figure 1 1 is a diagrammatic representation of the plasmid pHKM14 containing 
the CYP52A2B gene (SEQ ID NO: 87) present in the pBK-CMV vector. A detailed restriction _ 
map of only the sequenced region is shown at the top. The bar indicates the open reading frame. 
The direction of transcription is indicated by an arrow under the open reading frame. 
15 Figure 12 is a diagrammatic representation of the plasmid pHKMl 5 containing 

the CYP52A8B gene (SEQ ID NO: 93) present in the pBK-CMV vector. A detailed restriction 
map of only the sequenced region is shown at the top. The bar indicates the open reading frame. 
The direction of transcription is indicated by an arrow under the open reading frame. 

Figures 13A-13D show the complete DNA sequences including regulatory and 
20 coding regions for the CPRA gene (SEQ ID NO: 81) and CPRB gene (SEQ ID NO: 82) from C. 
tropicalis ATCC 20336. Figures 13A-13D show regulatory and coding region alignment of 
these sequences. Asterisks indicate conserved nucleotides. Bold indicates protein coding 
nucleotides; the start and stop codons are underlined. 

Figure 14 shows the amino acid sequence of the CPRA (SEQ ID NO: 83) and 
\s CPRB (SEQ ID NO: 84) proteins from C. tropicalis ATCC 20336 and alignment of these amino 
acid sequences. Asterisks indicate residues which are not conserved. 

Figures 15A-15M show the complete DNA sequences including regulatory and 
coding regions for the following genes from C. tropicalis ATCC 20366: CYP52A1 A (SEQ ID 
^ ■ NO- 85) CYP52A2A (SEQ ID NO: 86), CYP52A2B (SEQ ID NO: 87), CYP52A3A (SEQ ID NO: 
30 88) CYP52A3B (SEQ ID NO: 89), CYP52A5A (SEQ ID NO. 90), CYP52A5B (SEQ ID NO: 91), 
CYP52A8A (SEQ ID NO: 92), CYP52A8B (SEQ ID NO: 93), and CYP52D4A (SEQ ID NO: 94). 
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Figures 15A-15M show regulatory and coding reg ion alignment of these sequences. Asterisks 
UKhcate conserved nucleotides. Bold indicates protein coding nucleotides; the start and stop 
codons are underlined. 

Figures 16A-16C show the amino acid sequences encoding the CYP52A1A (SEQ 
5 ID NO: 95), CYP52A2A (SEQ ID NO: 96), CYP52A2B (SEQ ID NO: 97), CYP52A3A (SEQ ID 
NO: 98), CYP52A3B (SEQ ID NO: 99), CYP52A5A (SEQ ID NO: 100), CYP52A5B (SEQ ID 
NO: 101), CYP52A8A (SEQ ID NO: 102), CYP52A8B (SEQ ID NO: 103) and CYP52D4A (SEQ 
ID NO. 104) proteins from C tropical* ATCC 20336. Asterisks indicate identical residues and 
dots indicate conserved residues. 

10 Figure 17 is a diagrammatic representation of the pTAg PCR product cloning 

vector (commercially available from R&D Systems, Minneapolis, MN). 

Figure 1 8 is a plot of the log ratio (U/C) of unknown target DNA product to 
competitor DNA product venms the concentration of competitor mRNA. The plot is used to 
calculate the target messenger RNA concentration in a quantitative competitive reverse 
15 transcription polymerase chain reaction (QC-RT-PCR). 

Figure 19 is a graph showing the relative induction of C. tropicalis ATCC 20962 
CYP52A5A (SEQ ID NO: 90) by the addition of the fatty acid substrate Emersol® 267 to the 
growth medium. 

Figure 20 is a graph showing the induction of C. tropicalis ATCC 20962 CYPS2 
20 and CPR genes by Emersol® 267. P450 genes CYP52A3A (SEQ ID NO: 88), CYP52A3B (SEQ 
ID NO: 89), and CYP52D4A (SEQ ID NO: 94) are expressed at levels below the detection level 
of the QC-RT-PCR assay. 

Figure 21 is a scheme to integrate selected genes into the genome of Candida 
tropicalis strains and recovery of URA3A selectable marker. 

Figure 22 is a schematic representation of the transformation of C tropicalis 
H5343ura3-with OTan d/orC^ g ene, Only one URA3 locus needs to be functional. There 
are a total of 6 possible ural targets (5ura3A loci-2 pox4 disruptions, 2 pox 5 disruptions, 1 
wa3A locus; and 1 ura3B locus). 

Figure 23 is the complete DNA sequence (SEQ ID NO: 105) encoding URA3A 
from C. tropicalis ATCC 20336 and the amino acid sequence of the encoded protein (SEQ ID 
NO: 106). 
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Figure 24 is a schematic representation of the plasmid pURAin, the base vector 
for integrating selected genes into the genome of C iropicalis. The detailed construction of 
pURAin is described in the text 

Figure 25 is a schematic representation of the plasmid pNEB193 cloning vector 
(commercially available from New England Biolabs, Beverly, MA). 

Figure 26 is a diagrammatic representation of the plasmid pPAl 5 containing the 
truncated CYP52A2A gene present in the pTriplEx vector. A detailed restriction map of only the 
sequenced region is shown at the top. The bar indicates the open reading frame. The direction of 
transcription is indicated by an arrow under the open reading frame. 

Figure 27 is a schematic representation of pURA2in, the base vector is 
constructed in pNEB193 which contains the 8 bp recognition sequences for Asc I, Pad and Pme 
I. URA3A (SEQ ID NO: 105) and CYP52A2A (SEQ ID NO: 86) do not contain these 8 bp 
recognition sites. URA3A is inverted so that the transforming fragment will attempt to 
recircularize prior to integration. An Asc 1/Pme I fragment was used to transform H5343 ura\ 

Figure 28 shows a scheme to detect integration of CYP52A2A gene (SEQ ID NO: 
86) into the genome of H5343 unr. In all cases, hybridization band intensity could reflect the 
number of integrations. 



number of integrations, 
truncated 



Figure 29 is a diagrammatic representation of the plasmid pPA57 containing the 

CYP52A3A gene present in the pTriplEx vector. A detailed restriction map of only the 

sequenced region is shown at the top. The bar indicates the open reading frame. The direction of 
transcription is indicated by an arrow under the open reading frame. 

Figure 30 is a diagrammatic representation of the plasmid pPA62 containing the 
truncated CYP52A3B gene present in the pTriplEx vector. A detailed restriction map of only the 
sequenced region is shown at the top. The bar indicates the open reading frame. The direction of 
transcription is indicated by an arrow under the open reading frame. 

Figure 31 is a diagrammatic representation of the plasmid pPAL3 containing the 
truncated CYP52A5A gene present in the pTriplEx vector. A detailed restriction map of only the 
sequenced region is shown at the top. The bar indicates the open reading frame. The direction of 
transcription is indicated by an arrow "under tne open reading frame. 

Figure 32 is a diagrammatic representation of the plasmid pPA5 containing the 
truncated CYP52A5A gene present in the pTriplEx vector. A detailed restriction map of only the 



-20- 



mull lIUlllllllllll 



sequenced region is shown at the top. The bar indicates the open reading frame. The direction of 
transcription is indicated by an arrow under the open reading frame. 

Figure 33 is a diagrammatic representation of the plasmid pPA18 containing the 
truncated CYP52D4A gene present in the pTriplEx vector. A detailed restriction map of only the 
5 sequenced region is shown at the top. The bar indicates the open reading frame. The direction of 
transcription is indicated by an arrow under the open reading frame. 

Figure 34 is a graph showing the expression of CYP52A1 (SEQ ID NO: 85), 
CYP52A2 (SEQ ID NO: 86) and CYP52A5 genes (SEQ ID NOS: 90 and 91) from G tropicalis 
20962 in a fermehtor run upon the addition of amounts of the substrate oleic acid or tridecane in 

10 a spiking experiment. 

Figure 35 depicts a scheme used for the extraction and analysis of diacids and 

monoacids from fermentation broths. 

Figure 36 is a graph showing the induction of expression of CYP52A1A, 
CYP52A2A and CYP52A5A in a fermentor run upon addition of the substrate octadecane. No 
15 induction of CYP52A3A or CYP52A3B was observed under these conditions. 



DESCRTPTION OF THE PREFERR ED EMBODIMENTS 
Diacid productivity is improved according to the present invention by selectively 
increasing enzymes which are known to be important to the oxidation of organic substrates such 

20 as fatty acids composing the desired feed. According to the present invention, ten CYP genes 
and two CPR genes of C tropicalis have been identified and characterized that relate to 
participation in the ^-hydroxylase complex catalyzing the first step in the a)-oxidation pathway. 
In addition, a novel quantitative competitive reverse transcription polymerase chain reaction 
(QC-RT-PCR) assay is used to measure gene expression in the fermentor under conditions of 

25 induction by one or more organic substrates as defined herein. Based upon QC-RT-PCR results, 
three CYP genes, CYP52A1, CYP52A2 and CYP52A5, have been identified as being of greater 
importance for the w-oxidation of long chain fatty acids. Amplification of the CPR gene copy 
- - number improves productivity^ The QC-RT-PCR assay indicates that both CYP and CPR genes 
appear to be under tight regulatory control. 

30 In accordance with the present invention, a method for discriminating members of 

a gene family by quantifying the amount of target mRNA in a sample is provided which 
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includes a) providing an organism containing a target gene; b) culturing the organism with an 
organic substrate which causes upregulation in the activity of the target gene; c) obtaining a- 
sample of total RNA from the organism at a first point in time; d) combining at least a portion of 
the sample of the total RNA with a known amount of competitor RNA to form an RNA mixture, 
5 wherein the competitor RNA is substantially similar to the target mRNA but has a lesser number 
of nucleotides compared to the target mRNA; e) adding reverse transcriptase to the RNA mixture 
in a quantity sufficient to form corresponding target DNA and competitor DNA; (f) conducting a 
polymerase chain reaction in the presence of at least one primer specific for at least one 
substantially non-homologous region of the target DNA within the gene family, the primer also 
10 specific for the competitor DNA; g) repeating steps (c-f) using increasing amounts of the 
competitor RNA while maintaining a substantially constant amount of target RNA; h) 
determining the point at which the amount of target DNA is substantially equal to the amount of 
competitor DNA; i) quantifying the results by comparing the ratio of the concentration of 
unknown target to the known concentration of competitor; and j) obtaining a sample of total 
15 RNA from the organism at another point in time and repeating steps (d-i). 

In addition, modification of existing promoters and/or the isolation of alternative 
promoters provides increased expression of CYP and CPR genes. Strong promoters are obtained 
from at least four sources: random or specific modifications of the CYP52A2 promoter, 
CYP52A5 promoter, CYP52A1 promoter, the selection of a strong promoter from available 
Candida /Oxidation genes such as POX4 and POX5, or screening to select another suitable 
Candida promoter. 

Promoter strength can be directly measured using QT-RT-PCR to measure CYP 
and CPR gene expression in Candida cells isolated from fermentors. Enzymatic assays and 
antibodies specific for CYP and CPR proteins are used to verify that increased promoter strength 
is reflected by increased synthesis of the corresponding enzymes. Once a suitable promoter is 
identified, it is fused to the selected CYP and CPR genes and introduced into Candida for 
construction of a new improved production strain. It is contemplated that the coding region of 
the CYP and CPR genes can be fused to suitable promoters or other regulatory sequences which 
are well known to those skilled in the art. 

In accordance with the present invention, studies on C. tropicalis ATCC 20336 
have identified six unique CYP genes and four potential alleles. QC-RT-PCR analyses of cells 
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internal carbon-carbon double bond and at least one terminal methyl group, a terminal carboxyl 
group and/or a terminal functional group which is oxidizable to a carboxyl group by 
biooxidation. Examples of such compounds include but are not limited to 3,6-dimethyl, 1,4- 
cyclohexadiene; 3-methylcyclohexene; 3-methyl-l, 4-cyclohexadiene and the like. 

5 Examples of the aromatic compounds that can be used herein include but are not 

limited to arenes such as o-, m-, p-xylene; o-, m-, p-methyl benzoic acid; dimethyl pyridine, and 
the like. The organic substrate can also contain other functional groups that are biooxidizable to 
carboxyl groups such as an aldehyde or alcohol group. The organic substrate can also contain 
other functional groups that are not biooxidizable to carboxyl groups and do not interfere with 

10 the biooxidation such as halogens, ethers, and the like. 

Examples of saturated fatty acids which may be applied to cells incorporating the 
present CYP and CPR genes include caproic, enanthic, caprylic, pelargonic, capric, undecylic, 
lauric, myristic, pentadecanoic, palmitic, margaric, stearic, arachidic, behenic acids and 
combinations thereof. Examples of unsaturated fatty acids which may be applied to cells 

15 incorporating the present CYP and CPR genes include palmitoleic, oleic, erucic, linoleic, 
linolenic acids and combinations thereof. Alkanes and fractions of alkanes may be applied 
which include chain links from CI 2 to C24 in any combination. An example of a preferred fatty 
acid mixtures are Emersol® 267 and Tallow, both commercially available from Henkel 
Chemicals Group, Cincinnati, OH. The typical fatty acid composition of Emersol® 267 and 

20 Tallow is as follows: 
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CI 8:3 
C20.0 
C20:l 



0.5% 
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0.9% 



The following examples are meant to illustrate but not to limit the invention All 
relevant microbial strains and plasmids are described in Table 1 and Table 2, respectively. 
Table 1. List of Escherichia coli and Candida tropicalis strains 



ELCoU 
STRAIN 


GENOTYPE 


SOURCE 


XLlBlue- 
MRF 


endAl. gyrA9o t hsdR17 t far, recAl, 
relAl, supE44, thUl. [F lacflZMIS, 
praAB, TnlO) 


Stratagene, La Jolla, CA 


BM25.8 


SupE44 t thi (lac-proAB) [F* traD36 t 

proAB*, lacPZ Ml 5] 

Ximm434 (karf)Pl (cart) hsdR (r hir 

m tir) 


Clontech, Palo Alto, CA 


XLOLR t 


(mcrA)183 (mcrCB-hsdSMR-mrr) 1 73 
endAl thi-1 recAl gyrA96relAl lac 
WproAB lacPZMIS TnlO (Tef) Sir 
(nonsuppressing Vflambda resistant) 


Stratagene, La Jolla, CA 




ura3A/ura3B, 

pox4A::tira3A/pGx4B::ura3A, 
pax5::ura3A/pax5::VRA3A 
ura3A/ura3B, 

pox4A::ura3A/pox4B::ura3A, 
pax5;:ura3A/pax5::URA3A. ura3- 
ura3A/ura3B > 

pax4A::ura3A/pox4B::ura3A t 
pax5::ura3A/pax5:: URA3A, 
wa3::URA3A-CrP52A2A 
ura3A/ura3B t 

pax4A:;ura3A/pax4B::ura3A, 
pax5::ura3A/pax5;:URA3A, ' 
ura3:: VRA3A-CYP52A3A 
ura3A/ura3B t [ 
pox4A::ura3A/pox4B::ura3A, 
pox5::ura3A/pax5::URA3A 9 ' 
ura3::URA3A-CPRB 



Henkel 
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HDC15 


ura3A/wa3B, 

pox4A::ura3A/pax4B::ura3A t 

pax5::ura3A/pax5::URA3A t 

vra3::URA3A-CYP52A5A 


Henkel 


HDC20 


pax4A::ura3A/pax4B::ura3A, 
pax5::ura3A/pax5::URA3A f 
ura3::URA3A-CYP52A2A + CPRB 
(CYP and CPR have opposite 5' to 3' 
orientation with respect to each 
other) 


Henkel 


HDC23 


ura3A/ura3B, 

pax4A::wa3A/pox4B::ura3A, 
pax5::ura3A/pax5::URA3A 9 
ura3::URA3A-CYP52A2A + CPR B 
(CYP and CPR have same 5* to 3* 
orientation with respect to each 
other) 


Henkel 



Table 2. List of plasmids isolated from genomic libraries and constructed for use 
in gene integrations. 



10 



15 



Plasm id 


Base 
vector 


Insert 


Insert 
Size 


Plasmid 
size 


Description 


pURAin 


pNEB193 


URA3A 


1706 bp 


4399 bp 


pNEB193 with the URA3A gene 
inserted in the Ascl - Pmel site, 
generating a Pad site 


pURA 2in 


pURAin 


CYP52A2A 


2230 bp 


6629 bp 


pURAin containing a PCR 
CYP52A2A allele containing 
Pad restriction sites 


pURA 
REDB in 


pURAin 


CPRB 


3266 bp 


7665 bp 


pURAin containing a PCR 
CPRB allele containing Pad 
restriction sites 


pHKMl 


pTriplEx 


Truncated 
CPRA gene 


Approx. 
3.8 kb 


Approx. 
7.4 kb 


A truncated CPRA gene 
obtained by first screening 
library containing the 5' 
untranslated region and 1.2 kb 
open reading frame 


pHKM4 


PTriplEx 


Truncated 
CPRA gene 


Approx. 
5kb 


Approx. 
8.6 kb 


A truncated CPRA gene 
obtained by screening second 
library containing the 3 ' 
untranslated region end 
sequence 


pHKM9 


pBC- 
CMV 


CPRB 
gene 


Approx. 
5.3 kb 


Approx. 
9.8 kb 


iSPRB allele isolated from the 
third library 


pHKMll 


pBC- 
CMV 


CYP52AJA 


Approx. 
5kb 


Approx. 
9.5 kb 


CYP 52 Al A isolated from the 
third library 


pHKM12 


pBC- 
CMV 


CYP52A8A 


Approx. 
7.5 kb 


Approx. 
12 kb 


CYP52A8A isolated from the 
third library 


pHKM13 


pBC- 
CMV 


CYP52D4A 


Approx. 
7.3 kb 


Approx. 
11.8 kb 


CYP52D4A isolated from the 
third library 
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pHKM14 


pBC- 
CMV 


CYP52A2B 


Approx. 

£. 1.1- 

6 kb 


Approx. 
10.5 kb 


CYP52A2B isolated from the 
third library 


pHKMIS 


"pB& 

V*JY1 T 


CYP52A8B 


Approx. 
6.6 kb 


Approx. 
11.1 kb 


CYP52A8B isolated from the 
third library 


pPAL3 


pTriplEx 


CYPS2A5A 


4.4 kb 


Approx. 
8.1 kb 


CYP52A5A isolated from the 1st 
library 


pPA5 


pTriplEx 


CYP52A5B 


4.1 kb 


Approx. 
7.8 kb 


CYP52A5B isolated from the 
2nd library 


pPA15 


pTriplEx 


CYP52A2A 


6.0 kb 


Approx. 
9.7 kb 


CYP52A2A isolated from the 

2nH Ithrnrv 


pPA57 


pTriplEx 


CYP52A3A 


5.5 kb 


Approx. 
9.2 kb 


CYP52A3A isolated from the 
2nd library 


pPA62 


pTriplEx 


CYP52A3B 


6.0 kb 


Approx. 
9.7 kb 


CYP52A3B isolated from the 
2nd library 



EXAMPLE 1 

Purification of Genomic DNA from Candida tropicalis ATCC 20336 
A. Construction of Genomic Libraries 
50 ml of YEPD broth (see Chart) was inoculated with a single colony of C 
tropicalis 20336 from YEPD agar plate and grown overnight at 30°C. 5 ml of the overnight 
culture was inoculated into 100 ml of fresh YEPD broth and incubated at 30°C for 4 to 5 hr with 
shaking. Cells were harvested by centrifugation, washed twice with sterile distilled water and 
resuspended in 4 ml of spheroplasting buffer (1 M Sorbitol, 50 mM EDTA, 14 mM 
mercaptoethanol) and incubated for 30 min at 37°C with gentle shaking. 0.5 ml of 2 mg/ml 
zymolyase (ICN Pharmaceuticals, Inc., Irvine, CA) was added and incubated at 37°C with gentle 
shaking for 30 to 60 min. Spheroplast formation was monitored by SDS lysis. Spheroplasts 
were harvested by brief centrifugation (4,000 rpm, 3 min) and were washed once with the 
spheroplast buffer without mercaptoethanol. Harvested spheroplasts were then suspended in 4 
ml of lysis buffer (0.2 M Tris/pH 8.0, 50 mM EDTA, 1% SDS) containing 100 ug/ml RNase 
(Qiagen Inc., Chatsworth, CA) and incubated at 37°C for 30 to 60 min. 

Proteins were denatured and extracted twice with an equal volume of 
chloroform/isoamyl alcohol (24:1) by gently mixing the two phases by hand inversions. The two 
phases were separated by centrifugation at 1 0,000 rpm for 1 0 min and the aqueous phase 
cohta^ To the aqueous layer NaCl was 

added to a final concentration of 0.2 M and the DNA was precipitated by adding 2 vol of ethanol. 
Precipitated DNA was spooled with a clean glass rod and resuspended in TE buffer (10 mM 
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Tns/pH 8.0, 1 mM EDTA) and allowed to dissolve overnight at 4°C. To the dissolved DNA, 
RNase free of any DNase activity (Qiagen Inc., Chatsworth, CA) was added to a final 
concentration of 50 ug/ml and incubated at 37°C for 30 min. Then protease (Qiagen Inc., 
Chatsworth, CA) was added to a final concentration of 1 00 ug/ml and incubated at 55 to 60 °C 
for 30 min. The solution was extracted once with an equal volume of phenol/chloroform/isoamyl 
alcohol (25:24:1) and once with equal volume of cWoroform/isoamyl alcohol (24:1). To the 
aqueous phase 0.1 vol of 3 M sodium acetate and 2 volumes of ice cold ethanol (200 proof) were 
added and the high molecular weight DNA was spooled with a glass rod and dissolved in 1 to 2 
ml of TE buffer. 



B. Genomic DNA Preparation for PCR 
Amplification of CYP and CPR Genes 

Five 5 ml of YPD medium was inoculated with a single colony and grown at 

30°C overnight. The culture was centrifuged for 5 min at 1200 x g. The supernatant was 

removed by aspiration and 0.5 ml of a sorbitol solution (0.9 M sorbitol, 0.1 M Tris-Cl pH 8.0, 

0.1 M EDTA) was added to the pellet. The pellet was resuspended by vortexing and 1 ul of 2- 

mercaptoethanol and 50 ul of a 10 ug/ml zymolyase solution were added to the mixture. The 

tube was incubated at 37°C for 1 hr on a rotary shaker (200 rpm). The tube was then 

centrifuged for 5 min at 1200 x g and the supernatant was removed by aspiration. The protoplast 

pellet was resuspended in 0.5 ml lx TE (10 mM Tris-Cl pH 8.0, 1 mM EDTA) and transferred 

to a 1.5 ml microcentrifuge tube. The protoplasts were lysed by the addition of 50 ul 10% SDS 

followed by incubation at 65°C for 20 min. Next, 200 ul of 5M potassium acetate was added 

and after mixing, the tube was incubated on ice for at least 30 min. Cellular debris was removed 

by centrifugation at 13,000 x g for 5 min. The supernatant was carefully removed and 

transferred to a new microfuge tube. The DNA was precipitated by the addition of 1 ml 100% 

(200 proof) ethanol followed by centrifugation for 5 min at 13,000 x g. The DNA pellet was 

washed with 1 ml 70 % ethanol followed by centrifugation for 5 min at 13,000 x g. After 

partially drying the DNA under a vacuum, it was resuspended in 200 ul of 1 x TE. The DNA 

concentration was determined by ratio of the absorbance at 260 nm / 280 nm (A 260/2M ). 
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EXAMPLE 2 

Construction of Candida tropicalis 20336 Gen mic Libraries 
Three genomic libraries of C tropicalis were constructed, two at Clontech 
Laboratories, Inc., (Palo Alto, CA) and one at Henkel Corporation (Cincinnati, OH). 

5 

A. Clontech Libraries 

The first Clontech library was made as follows: Genomic DNA was prepared 
from C tropicalis 20336 as described above, partially digested with EcoRl and size fractionated 
by gel electrophoresis to eliminate fragments smaller than 0.6 kb. Following size fractionation, 

10 several ligations of the EcoKL genomic DNA fragments and lambda (X) TriplEx™ vector (Figure 
1) arms with EcoKL sticky ends were packaged into X phage heads under conditions designed 
to obtain one million independent clones. The second genomic library was constructed as 
follows: Genomic DNA was digested partially with SauSAl and size fractionated by gel 
electrophoresis. The DNA fragments were blunt ended using standard protocols as described, 

15 e.g., in Sambrook et al, Molecular Cloning: A Laboratory Manual \ 2ed. Cold Spring Harbor 
Press, USA (1989), incorporated herein by reference. The strategy was to fill in the Sau3AI 
overhangs with Klenow polymerase (Life Technologies, Grand Island, NY) followed by 
digestion with SI nuclease (Life Technologies, Grand Island, NY). After SI nuclease digestion 
the fragments were end filled one more time with Klenow polymerase to obtain the final blunt- 

20 ended DNA fragments. EcoKl linkers were ligated to these blunt-ended DNA fragments 

followed by ligation into the XTriplEx vector. The resultant library contained approximately 2 X 
10 6 independent clones with an average insert size of 4.5 kb. 

B. Henkel Library 

25 The third genomic library was constructed at Henkel Corporation using JlZAP 

Express™ vector (Stratagene, La Jolla, CA) (Figure 2). Genomic DNA was partially digested 
with Sau3AJ and fragments in the range of 6 to 12 kb were purified from an agarose gel after 
electrophoresis of the digested DNA. 1 These DNA fragments were then ligated to BamHl 
digested 7JZAP Express™ vector arms according to manufacturers protocols. Three ligations 

30 were set up to obtain approximately 9.8 X 10 5 independent clones. All three libraries were 
pooled and amplified according to manufacturer instructions to obtain high-titre (>10 9 plaque 
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"^rr^lof coKXUB.ue.MRF — * coW XLlBluc-MRF' were ^ 

10 determined. 

it Y AMPLE 3 
Screening of Genomic Libraries 

te genomic libraries conaruced m the* vectors c rf 

M Thec.onings.«em«npH* -*» ^a.w.WIBI 

flankedbytafsiteffigurel). When XTnplEx ' " stheexcisionaIla 

(^tfC—^^^^T^ The 

ci ^onofp l asnndpT ri p E x to n,« 1 eph^eXTnp 1 ^* b 

i --i «rv PMV from phage XZAr express 
^harismof^cisionofp.asnudpBK-CMVfe >P ^ m £ ^ 

sud , a sXL0R(S««agene).B0ftpTriplExandpBK-CMVcan p 



1) Colony Lifts 




A single colony of £ C oli Rlo< d 
«* ^yoin, 10DlMMgSO< ^ „ ^ ** 5 «, of LB ^ 50 

JI:-T < " ,,,clbr "■*-•-* 

™ - *« - «-B aga, 50 . . " ^ fc 1 «• ^ 



2) DNA Hybridizations 



Membranes were dried 

Vaacc„. ing , ommufacWspro ^W 
» T 5 X SS C Oil* SDS (i„ a v 1 "• — «*- 

\ /obDS 0navolume 
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equivalent to 2 ml/cm J of membrane) for 15 min each. The hybridization signal was then 
generated and detected with Hyperfilm ECL™ (Amersham) according to manufacturer's 
protocols. Membranes were aligned to plates containing bacterial colonies from which colony 
lifts were performed and colonies corresponding to positive signals on X-ray were then isolated 
5 and propagated in LB broth. Plasmid DNA's were isolated from these cultures and analyzed by 
restriction enzyme digestions and by DNA sequencing. 

B. Screening Genomic Libraries (Plaque Form) 
1) X Library Plating 

[0 E coli XLlBlue-MRF cells were grown overnight in LB medium (25 ml) 

containing 10 mM MgS0 4 and 0.2% maltose at 37°C, 250 rpm. Cells were then centrifuged 
(2,200 x g for 10 min) and resuspended in 0.5 volumes of 10 mM MgSCv 500 ul of this E. coli 
culture was mixed with a phage suspension containing 25,000 amplified lambda phage particles 
and incubated at 37 °C for 1 5 min. To this mixture 6.5 ml of NZCYM top agarose (maintained at 

15 60 °C) (see Chart) was added and plated on 80 - 1 00 ml NCZYM agar (see Chart) present in a 
150 mm petridish. Phage were allowed to propagate overnight at 37 °C to obtain discrete 
plaques. After overnight growth plates were stored in a refrigerator for 1-2 hr before plaque lifts 
were performed. 

20 2) Plaque Lift and DNA Hybridizations 

Magna Lift™ nylon membranes (Micron Separations, Inc., Westborough, MA) 
were placed on the agar surface in complete contact with X plaques and transfer of plaques to 
nylon membranes was allowed to proceed for 5 min at RT. After plaque transfer the membrane 
was placed on 2 sheets of Whatman 3M™ (Whatman, Hillsboro, OR) filter paper saturated with 

25 a 0.5 N NaOH, 1 .0 M NaCl solution and left for 10 min at RT to denature DNA. Excess 

denaturing solution was removed by blotting briefly on dry Whatman 3M paper. Membranes 
were then transferred to 2 sheets of Whatman 3M™ paper saturated with 0.5 M Tris-HCl (pH 
8.0), 1 .5 M NaCl and left for 5 min to neutralize. Membranes were then briefly washed in 200 - 
500 ml of 2 X SSC, dried by air and baked for 30 - 40 min at 80°C. The membranes were then 

30 probed with labelled DNA. 
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Membranes were prewashed with a 200 - 500 ml solution of 5 X SSC 0 5% SDS 
1 mM EDTA (pH 8.0) for 1 - 2 hr at 42'C with shaking (60 rpm) to get rid of bacterial' debris ' 
from the membranes. The membranes were prehybridized for 1 - 2 hr at 42°C with (in a volume 
equivalent to 0.125 - 0.25 ml/cm> of membrane) ECL Gold™ buffer (Amersham) containing 0.5 
M Nad and 5% blocking reagent. DNA fragments that were used as probes were purified from 
agarose gel using a QIAEX II™ gel extraction kit (Qiagen Inc., Chatsworth, CA) according to 
manufacturers protocol and labeled using an Amersham ECL™ direct nucleic acid labeling kit 
(Amersham). Labeled DNA (5 - 10 ng/ml hybridization solution) was added to the prehybridized 
membranes and the hybridization was allowed to proceed overnight. The following day 
membranes were washed with shaking (60 rpm) twice at 42°C for 20 min each time in (in a 
volume equivalent to 2 ml/cm'of membrane) a buffer containing either 0.1 (high stringency) or 
0.5 flow stringency) X SSC, 0.4% SDS and 360 g/1 urea. This was followed by two 5 min 
washes at room temperature in (in a volume equivalent to 2 ml/cm 2 of membrane) 2 X SSC. 
Hybridization signals were generated using the ECL™ nucleic acid detection reagent and 
15 detected using Hyperfilm ECL™ (Amersham). 

Agar plugs which contained plaques corresponding to positive signals on the X- 
ray film were taken from the master plates using the broad-end of Pasteur pipet. Plaques were 
selected by aligning the plates with the x-ray film. At this stage, multiple plaques were generally 
taken. Phage particles were eluted from the agar plugs by soaking in 1 ml SM buffer (Sambrook 
et al., supra) overnight. The phage eluate was then diluted and plated with freshly grown K coli 
XLlBlue-MRF cells to obtain 100 - 500 plaques per 85 mm NCZYM agar plate. Plaques were 
transferred to Magna Lift nylon membranes as before and probed again using the same probe. 
Single well-isolated plaques corresponding to signals on X - ray film were picked by removing 
agar plugs and eluting the phage by soaking overnight in 0.5 ml SM buffer. 



20 



25 



30 



C. Conversion of X Clones to Plasmid Form 

The lambda clones isolated were converted to plasmid form for further analysis 
Conversion from the plaque to the plasmid form was accomplished by infecting the plaques into 
E coli strain BM25.8. The E. coli strain was grown overnight at 31 °C, 250 rpm in LB broth 
containing 10 mM M g S0 4 and 0. 2 o/ 0 maltose until the OD^ reached 1.1 - 1.4. Ten milliliters of 
the overnight culture was removed and mixed with 100 ul of 1 M MgCl, A 200 ul volume of 

-34- 



10 



# ft 

cells was removed, mixed with 1 50 ul of eluted phage suspension and incubated at 3 1 °C for 30 
min. LB broth (400 p.1) was added to the tube and incubation was continued at 3 1 °C for 1 hr 
with shaking, 250 rpm. 1 - 10 ul of the infected cell suspension was plated on LB agar 
containing 100 ug/ml ampicillin (Sigma, St. Louis, MO). Well-isolated colonies were picked 
and grown overnight in 5 ml LB broth containing 100 ug/ml ampicillin at 37°C, 250 rpm. 
Plasmid DNA was isolated from these cultures and analyzed. To convert the XZAP Express™ 
vector to plasmid form E. coli strains XLlBlue-MRF' and XLOR were used. The conversion 
was performed according to the manufacturer's (Stratagene) protocols for single-plaque 



excision. 



EXAMPLE 4 
Transformation of C tropicalis H5343 unr 
A. Transformation of C tropicalis H5343 by Electroporation 

5 ml of YEPD was inoculated with C. tropicalis H5343 ura- from a frozen 

15 stock and incubated overnight on a New Brunswick shaker at 30°C and 1 70 rpm. The next day, 
10 ul of the overnight culture was inoculated into 100 ml YEPD and growth was continued at 
30°C, 170 rpm. The following day the cells were harvested at an OD^ of 1.0 and the cell 
pellet was washed one time with sterile ice-cold water. The cells were resuspended in ice-cold 
sterile 35 % Polyethylene glycol (4,000 MW) to a density of 5xl0 8 cells/ml. A 0.1 ml volume of 

20 cells were utilized for each electroporation. The following electroporation protocol was 

followed: 1 .0 //g of transforming DNA was added to 0.1 ml cells, along with 5 Mg denatured, 
sheared calf thymus DNA and the mixture was allowed to incubate on ice for 15 min. The cell 
solution was then transferred to an ice-cold 0.2 cm electroporation cuvette, tapped to make sure 
the solution was on the bottom of the cuvette and electroporated. The cells were electroporated 

25 using an Invitrogen electroporator (Carlsbad, CA) at 450 Volts, 200 Ohms and 250 /zF. 

Following electroporation, 0.9 ml SOS media (1M Sorbitol, 30% YEPD, 10 mM CaCLJ was 
added to the suspension. The resulting culture was grown for 1 hr at 30°C, 170 rpm. Following 
the incubation, the cells were pelleted by cenuifugation at 1 500 x g for 5 min. The 
electroporated cells were resuspended in 0.2 ml of 1M sorbitol and plated on synthetic complete 

30 media minus uracil (SC - uracil) (Nelson, supra). In some cases the electroporated cells were 
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plated directly onto SC - uracil. Growth of transformants was monitored for 5 days. After three 
days, several transformants were picked and transferred to SC-uracil plates for genomic DNA 
preparation and screening. 

B. Transformation of C tropicalis Using Lithium Acetate 

The following protocol was used to transform C. tropicalis in accordance with the 

procedures d^ctibcd in Cunent Protocols in Molecular Biology, Supplement S t 13.7.1 (1989), 
incorporated herein by reference. 

5 ml of YEPD was inoculated with C. tropicalis H5343 ura- from a frozen stock 
and incubated overnight on a New Brunswick shaker at 30 °C and 1 70 rpm. The next day, 1 0 ul 
of the overnight culture was inoculated into 50 ml YEPD and growth was continued at 30«C, 170 
rpm. ThefolIowingdaythecellswereharvestedatanOD^ofl.O. The culture was transferred 
to a 50 ml polypropylene tube and centrifuged at 1000 X g for 10 min. The cell pellet was 
resuspended in 10 ml sterile TE (lOmM Tris-Cl and ImM EDTA, pH 8.0). The cells were again 
centrifuged at 1000 X g for 10 min and the cell pellet was resuspended in 10 ml of a sterile 
lithium acetate solution [LiAc ( 0.1 M lithium acetate, 10 mM Tris-Cl, pH 8.0, 1 mM EDTA)]. 
Following centrifugation at 1000 X g for 10 min., the pellet was resuspended in 0.5 ml LiAc. 
This solution was incubated for one hour at 30°C while shaking gently at 50 rpm. A 0.1 ml 
aliquot of this suspension was incubated with 5 ug of transforming DNA at 30°C with no 
shaking for 30 min. A 0.7 ml PEG solution (40 % wt/vol polyethylene glycol 3340, 0.1 M 
lithium acetate, 10 mM Tris-Cl, pH 8.0, 1 mM EDTA) was added and incubated at 30«C for 45 
min. The tubes were then placed at 42°C for 5 min. A 0.2 ml aliquot was plated on synthetic 
complete media minus uracil (SC - uracil) (Kaiser et al. Methods in Yeast Genetics, Cold Spring 
Harbor Laboratory Press, USA, 1994, incorporated herein by reference). Growth of 
transformants was monitored for 5 days. After three days, several transformants were picked and 
transferred to SC-uracil plates for genomic DNA preparation and screening. 

.- v r EXAMPLE H 

Plasmid DNA Isolation 
Plasmid DNA were isolated from E. coli cultures using Qiagen plasmid isolation 
kit (Qiagen Inc., Chatsworth, CA) according to manufacturer's instructions. 
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EXAMPI.FA 
DNA Sequencing and Analysis 



10 



15 



20 



25 



30 



EXAMPLE 7 
PCR Protocols 

A™^ r H PCRMPMM ' i0nW ^ 0U,toaPaki » E '«' CTll enncc y c I er n s ing u K 

J tXpand Hl " Fl r a ? Polymerase (Boehringer Mannheim t«a- ,• ~ 
were used per manufacturer's recomm^ Mannheim, Indianapolis, IN) 

laciurer s recommendations or as defined in Table 3 



Table 3. PCR amplification conditions used with different 



primer combinations. 



PRIMER 
COMBINATION 



3674-41-]/ 41-2/ 41-4 
+ 3674-41-4 



URA Primer la 
URA Primer lb 
URA Primer 2a 
URA Prime r 2b 
CVP2AMI 
CYPlAn 
CYP3 A#l 

CPR B#l 
CPRW2 



Tag 



Ampli- 
Taq Go ld 
AmpJi 
Tog Go ld 
Ampli 
Tag G old 
Ampl 
Tag Gold 
Ultma Tag 

Expand 
Hi-Fi 
Taa 



TEMPLATE 
DENATURING 
CONDITION 

94 C/30 sec 

95 C/l min 

95 C/l min 

95 C/l mm 

95 CTTmm 

94 C/l 5 sec 
94 C/l 5 sec 



ANNEALING 
TEMP/TIME 

55 C/30 sec 

70 C/l min 

70 C/l min 

70 C/l min 

70 C/l min 

50 C/30 sec 
50 C/30 sec 



EXTENSION 
TEMP/TIME 

72 C/l mm 

72C/2min 

72C/2min 

72 C/2 min" 

72 C/l min 

68 C/3 min 
68 C/3 min 
+20 sec/cvcle 



CYCLE 
Number 

~30~ 



35 
IT 
IT 

lo~ 
IF 

15 
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CYPSAHl 
CYPSM2 


Expand 

Hi-Fi 

Tag 


94 Ol 5 sec 
94C/15sec 


SO C/30 sec 
50 C/30 sec 


68 C/3 min 
68C/3 min 
+20 sec/cycle 


10 
IS 



Table 4 below contains a list of primers (SEQ ID NOS: 1-35) used for PCR amplification to 
construct gene integration vectors or to generate probes for gene detection and isolation. 



Table 4. Primer table for PCR amplification to construct gene integration vectors, to generate 
probes for gene isolation and detection and to obtain DNA sequence of constructs. (A- 
deoxyadenosine triphosphate [dATP], G- deoxyguanosine triphosphate [dGTP], C- 
deoxycytosine triphosphate [dCTP], T- deoxythymidine triphosphate [dTTP], Y- dCTP or dTTP, 
R- dATP or dGTP, W- dATP or dTTP, M- dATP or dCTP, N- dATP or dCTP or dGTP or 
dTTP). 



Target 
gene(s) 


Patent 
Primer 
Name 


Lab 
Primer 
Name 


Sequence {p 10 j ) 


PCR 
Product Size 


CYP52A2A 


CYP2A#1 


3659-72M 


CCTTAA 77/L4 ATGCACGAAGCGGAGA 

TAAAAG 

(SEQ ID NO: 1) 


2230 bp ' 




CYP2A#2 


3659-72N 


CCTTAA JTAAGCA1 AAU(J 1 1 l tUAU 
TCT 

(SEQ ID NO: 2) 




CYP52A3A 


CYP3A#1 


3659-720 


CCTTAA ACGCAATGGG AAC ATG 

GAGTG 

(SEQ ID NO: 3) 


2154 bp 








CCTTAA 7TvL4TCGCACTACGGTTATl G 

GTATCAG 

(SEQ ID NO: 4) 














CYP52ASA 


CYP5A#1 


3659-72K 


CCTTAA 7X4i4TCAAAGTACGTTCAGGC 
GG 

(SEQ ID NO: 5) 


3298 bp 




CYP5A#2 


3659-72L 


CCTTAA TTAA G GC AG AC AAC AACTTG 

GCAAAGTC 

(SEQ ID NO: 6) 














CPRB 


CPRB#1 


3698-20A 


CCTT AA 77^4 G AGGTCGTTGGTTG AGT 
TTTC 

(SEQ ID NO: 7) 


3266 bp 




CPRB#2 


3698-20B 


CCTTAA 7T/L4TTG ATAATG ACGTTGCG 
GG 

(SEQ ID NO: 8) 














URA3A 


URA Primer 
la 


3698-7C 


^GGCGCGCCXjGAGTCCAAAAAGACC 

AACCTCTG 

(SEQ ID NO: 9) 


956 bp 


r — 


URA Primer 
lb 


3698-7D^ 


CCTTAA 77VL4TACGTGG ATACCTTCAA 

GCAAGTG 

(SEQ ID NO: 10) 
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URA3A I URA Primer | 3698-7A 
2a 

URA Primer | 3698-7B 
2b 



CC7T^77A4GCTCACGAGTTTTGGGA 
TTTTCGAG 

(SEQIDNO: 11) 

uuuj I i A44CCGCAG AGGTTGGTCTT " 
TTTGGACTC 
(SEQIDNO: 12) 

GGGTTTAAAC - Pme I restriction site 
(SEQIDNO: 13) 

AGGCGCGCC - Ascl restriction site 
(SEQIDNO: 14) 



CPR FMN1 3674-4M 


CCTTAATTAA - Pad restriction site J ' ' 

(SEQIDNO: 13) I 

" TCYCAAACWGGTACWGCWGAA 

(SEQIDNO: 16) 


CPR FMN2 3674-41-2 


GGTTTGGGTAAYTCWACTTAT | ■ 

(SEQIDNO: 17) 


\CPR TFAD 3674-4L3 


CGTTATTAYTCYATTTCTTC 

(SEQIDNO: 18) 


CPR NADPH 3674-41-4 


GCMACACCRGTACCTGGACC " 

(SEQIDNO: 19) | | 


CPR PRK1.F3 PRKI.F3 


ATCCCAATCGTAATCAGC 

(SEQIDNO: 20) 


CPR PRK1.F5 PRK1.F5 


ACTTGTCTTCGTTTAGCA " T~ ~ 

(SEQIDNO: 21) 


CPR PRK4.R20 PRK4.R20 


CTACGTCTGTGGTGATGC ' : 

(SEQIDNO: 22) 


CYP UCupl UCupl 


CGNGAYACNACNGCNGG " 

(SEQIDNO: 23) 


CYP UCup2 UCup2 


AGRGAYACNACNGCNGG | ' ~A 

(SEQIDNO: 24) | | 


CYP UCdownl UCdownl 


AGNGCRAAYTGYTGNCC — H ' ■ 1 

(SEQIDNO: 25) 


CYP UCdown2 UCdown2 


YAANGCRAAYTGYTGNCC " — \ 

(SEQIDNO: 26) 


CYP HemeBl HemeBl 


ATTCAACGGTGGTCCAAGAATCTGTT 

TGG 

(SEQIDNO: 27) 


\CYP T2j5P flJsP 


GAGCTATGTTGAGACCACAGTTTGC " 

(SEQIDNO: 28) 


K?? 2.3.5M 2,3,5M 


uriCAGTTAAAGCAAATTGTTTGGCC " " 

(SEQIDNO: 29) 


pTriplEx Triplex5' TriplexS* 
[ vector 1 1 

pTriplEx Triplex3 > fTripiexT 
vector I 1 

CYP Cyp52a Cyp52a 

^ Cyp52b Cyp52b 

CYF Cyp52c Cyp52c 

Cyp52d J Cyp52d 


CTCGGGAAGCGCGCCATTGTGTTGG 

(SEQIDNO: 30) 

TAATACGACTCACTATAGGGCGAAT | 

TGGC 

(SEQIDNO: 31) 

TGRYTCAAACCATCTYTCTGG ' 

(SEQIDNO: 32) 

GGACCGGCGTTAAAGGG 

(SEQIDNO: 33) | 

CATAGTCG W AT Y ATGCTTAG ACC I " 

(SEQIDNO: 34) : 1 

GGACCACCATTGAATGG ~ " 

(SEQ ID NO: 35) | | 



10 



15 



20 



25 



30 
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example;? 

Yeast C lony FCR Procedure for Confirmati n of Gene 
Integration into the Genome fC tropicalis 

5 Single yeast colonies were removed from the surface of transformation plates, 

suspended in 50 fA of spheroplasting buffer (50mM KC1, lOmM Tris-HCl, pH 8.3, 1.0 mg/ml 
Zymolyase, 5% glycerol) and incubated at 37°C for 30 min. Following incubation, the solution 
was heated for 10 min at 95 °C to lyse the cells. Five ^1 of this solution was used as a template in 
PCR. Expand Hi-Fi Taq polymerase (Boehringer Mannheim, Indianapolis, IN) was used in PCR 

10 coupled with a gene-specific primer (gene to be integrated) and a URA3 primer. If integration 
did occur, amplification would yield a PCR product of predicted size confirming the presence of 
an integrated gene. 

EXAMPLE 9 

15 Fermentation Method for Gene Induction Studies 

A fennentor was charged with a semi-synthetic growth medium having the 
composition 75 g/1 glucose (anhydrous), 6.7 gA Yeast Nitrogen Base (Difco Laboratories), 3 g/1 
yeast extract, 3 g/1 ammonium sulfate, 2 g/1 monopotassium phosphate, 0.5 g/1 sodium chloride. 
Components were made as concentrated solutions for autoclaving then added to the fermentor 

20 upon cooling: final pH approximately 5.2. This charge was inoculated with 5-10% of an 

overnight culture of C tropicalis ATCC 20962 prepared in YM medium (Difco Laboratories) as 
described in the methods of Examples 17 and 20 of US Patent 5,254,466, which is incorporated 
herein by reference. C. tropicalis ATCC 20962 is a POX 4 and POX 5 disrupted C tropicalis 
ATCC 20336. Air and agitation were supplied to maintain the dissolved oxygen at greater than 

25 about 40% of saturation versus air. The pH was maintained at about 5.0 to 8.5 by the addition of 
5N caustic soda on pH control. Both a fatty acid feedstream (commercial oleic acid in this 
example) having a typical composition: 2.4% C H ; 0.7% C U:1 ; 4.6% C !6 ; 5.7% C I6:1 ; 5.7% C XTA ; 
1.0% C l8 ; 69.9% C l8:1 ; 8.8% C lg:2 ; 0.30% C !8:3; 0.90% C 20:! and a glucose co-substrate feed were 
- ; .addedin a feedbatch mode beginning near the end of exponential growth. Caustic was added on 

30 pH control during the byconversion of fatty acids to diacids to maintain the pH in the desired 
range. Typically, samples for gene induction studies were collected just prior to starting the fatty 
acid feed and over the first 10 hours of byconversion. Determination of fatty acid and diacid 
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content was determined by a standard methyl ester protocol using gas liquid chromatography 
(GLC). Gene induction was measured using the QC-RT-PCR protocol described in this 
application. 



cultures of C tropicalis. The cellular RNA was isolated using the Qiagen RNeasy Mini Kit 
(Qiagen Inc., Chatsworth, CA) as follows: 2 ml samples of C tropicalis cultures were collected 

10 from the fermentor in a standard 2 ml screw capped Eppendorf style tubes at various times before 
and after the addition of the fatty acid or alkane substrate. Cell samples were immediately frozen 
in liquid nitrogen or a dry-ice/alcohol bath after their harvesting from the fermentor. To isolate 
total RNA from the samples, the tubes were allowed to thaw on ice and the cells pelleted by 
centrifugation in a microfuge for 5 minutes (min) at 4°C and the supernatant was discarded while 

15 keeping the pellet ice-cold. The microfuge tubes were filled 2/3 full with ice-cold Zirconia/Silica 
beads (0.5 mm diameter, Biospec Products, Bartlesville, OK) and the tube filled to the top with 
ice-cold RLT* lysis buffer (* buffer included with the Qiagen RNeasy Mini Kit). Cell rupture 
was achieved by placing the samples in a mini bead beater (Biospec Products, Bartlesville, OK) 
and immediately homogenized at full speed for 2.5 min. The samples were allowed to cool in a 

20 ice water bath for 1 minute and the homogenization/cool process repeated two more times for a 
total of 7.5 min homogenization time in the beadbeater. The homogenized cells samples were 
microfuged at full speed for 10 min and 700 ^1 of the RNA containing supernatant removed and 
transferred to a new eppendorf tube. 700 \i\ of 70% ethanol was added to each sample followed 
by mixing by inversion. This and all subsequent steps were performed at room temperature. 

25 Seven hundred microliters of each ethanol treated sample were transferred to a Qiagen RNeasy 
spin column, followed by centrifugation at 8,000 x g for 15 sec. The flow through was 
discarded and the column reloaded with the remaining sample (700 |il) and re-centrifuged at 
8,000 x g for 1 5 sec. The column was washed once with 700 \xl of buffer RW1 *, and 
centrifuged at 8,000 x g for 1 5 sec and the flow through discarded. The column was placed in a 

30 new 2 ml collection tube and washed with 500 \il of RPE* buffer and the flow through discarded. 
The RPE* wash was repeated with centrifugation at 8,000 x g for 2 min and the flow through 



5 



EXAMPLE 10 



RNA Preparation 

The first step of this protocol involves the isolation of total cellular RNA from 
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discarded. The spin column was transferred to a new 1.5 ml collection tube and 100 fxl of RNase 
free water added to the column followed by centrifugation at 8,000 x g for 15 seconds. An 
additional 75 nl of RNase free water was added to the column followed by centrifugation at 
8,000 x g for 2 min. RNA eluted in the water flow through was collected for further purification. 
5 The RNA eluate was then treated to remove contaminating DNA. Twenty 

microliters of 10X DNase I buffer (0.5 M tris (pH 7.5), 50 mM CaCl 2 , 100 mM MgClJ, 10 jil of 
RNase-free DNase I (2 Units/^il, Ambion Inc., Austin, Texas) and 40 units Rnasin (Promega 
Corporation, Madison, Wisconsin) were added to the RNA sample. The mixture was then 
incubated at 37°C for 15 to 30 min. Samples were placed on ice and 250 \xl Lysis buffer RLT* 

10 and 250 ^1 ethanol (200 proof) added. The samples were then mixed by inversion. The samples 
were transferred to Qiagen RNeasy spin columns and centrifuged at 8,000 x g for 15 sec and the 
flow through discarded. Columns were placed in new 2 ml collection tubes and washed twice 
with 500 \xl of RPE* wash buffer and the flow through discarded. Columns were transferred to 
new 1.5 ml eppendorf tubes and RNA was eluated by the addition of 100 nl of DEPC treated 

15 water followed by centrifugation at 8,000 x g for 15 sec. Residual RNA was collected by adding 
an additional 50 jal of RNase free water to the spin column followed by centrifugation at full 
speed for 2 min. 10 \il of the RNA preparation was removed and quantified by the (A 26a/280 ) 
method. RNA was stored at 

-70°C. Yields were found to be 30-100 ng total RNA per 2.0 ml of fermentation broth. 

20 

EXAMPLE 11 

Quantitative Competitive Reverse Transcription Polymerase 
Chain Reaction (QC-RT-PCR) Protocol 

25 QC-RT-PCR is a technique used to quantitate the amount of a specific RNA in a 

RNA sample. This technique employs the synthesis of a specific DNA molecule that is 
complementary to an RNA molecule in the original sample by reverse transcription and its 
subsequent amplification by polymerase chain reaction. By the addition of various amounts of a 
competitor RNA molecule to the sample one can determine the concentration of the RNA 

30 molecule of interest (in this case the mRNA transcripts of the CYP and CPR genes). The levels 
of specific mRNA transcripts were assayed over time in response to the addition of fatty acid 
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and/or alkane substrates to the growth medium of fermentation grown G tropicalis cultures for 
the identification and characterization of the genes involved in the oxidation of these substrates. 
This approach can be used to identify the CYP and CPR genes involved in the oxidation of any 
given substrate based upon their transcriptional regulation. 

5 

A. Primer Design 

The first requirement for QC-RT-PCR is the design of the primer pairs to be used 
in the reverse transcription and subsequent PCR reactions. These primers need to be unique and 
specific to the gene of interest. As there is a family of genetically similar CYP genes present in 

10 C tropicalis 20336, care had to be taken to design primer pairs that would be discriminating and 
only amplify the gene of interest, in this example the CYP52A5 gene. In this manner, unique 
primers directed to substantially non-homologous (aka variable) regions within target members 
of a gene family are constructed. What constitutes substantially non-homologous regions is 
determined on a case by case basis. Such unique primers should be specific enough to anneal the 

15 non-homologous region of the target gene without annealing to other non-target members of the 
gene family. By comparing the known sequences of the members of a gene family, non- 
homologous regions are identified and unique primers are constructed which will anneal to those 
regions. It is contemplated that non-homologous regions herein would typically exhibit less than 
about 85% homology but can be more homologous depending on the positions which are 

20 conserved and stringency of the reaction. After conducting PCR, it may be helpful to check the 
reaction product to assure it represents the unique target gene product. If not, the reaction 
conditions can be altered in terms of stringency to focus the reaction to the desired target. 
Alternatively a new primer or new non-homologous region can be chosen. Due to the high level 
of homology between the genes of the CYP52A family, the most variable 5 prime region of the 

25 CYP52A5 coding sequence was targeted for the design of the primer pairs. In Figure 3, a portion 
of the 5 prime coding region for the CYP52A5A (SEQ ID NO: 36) allele of C tropicalis 20336 is 
shown. The boxed sequences in Figure 3 are the sequences of the forward and backwards 
primers (SEQ ID NOS: 47 and 48) used to quantitate expression of both alleles of this gene. The 
actual reverse primer (SEQ ID NO: 48) contains one less adenine than that shown in Figure 3. 

30 Primers used to measure the expression of specific G tropicalis 20336 genes using the QC-RT- 
PCR protocol are listed in Table 5 (SEQ ID NOS: 37-58). 



-43- 



Table 5. Primer used to measure C. tropicalis gene expression in 

t%*t* r*/-iT* . • 



Primer 
Name 


Direction 


Target 


Sequence — i 


3737-89F 


F 


CYP52A1A 


CCGATGAAU 11TI CGACGAGTACCC 

(SEQIDNO:37) | 


3737-89B 


B 


CYP52A1A 


AAGGCTTTAACGTGTCCAATCTGGTC H 
(SEQIDNO:38) 


alk2aFl 


F 


CYPS2A2A 


ATTATCGCCACATACTTCACCAAATGG 
(SEQ ID NO' 39^ 


alk2aB5 


B 


CYP52A2A 


CGAGATCGTGGATACGCTGGAGTG H 
(SEQroNO:40) 


7581-178-3 


F 


CYP52A3A 


uv^wtv, t wu l ftAt 1 1 1 0 1 UAUUUAC 1 

(SEQIDN0:41) 


7581-178-4 


B 


CYP52A3A 


CATTGAACTGAGTAGCCAAAACAGCC 

/ccA rn xja. An\ 1 


3737-50F 


F 


CYP52A3A 
CYP52A3B 


CCTACGTTl GGTATCGCTACTCCGTTG 

/CCA TT\ VJA. A1\ 1 


3737-50B 


B 


CYP52A3A 

d 

CYP52A3B 


ill CCAULCAOCALCG 1 CCAAU 1 
(SEQIDNO:44) 


3737-1 75F 


F 


CYP52D4A 


GCAGAGCCGATCTATGTTGCGTCC 

f^Fn in wn- a<\ 1 


3737- 175B 


B 


CYP52D4A 


TCATTGAATGCTTCCAGGAACCTCG 

f q<fo m wn- a&\ 1 


7581-97-F 


F 


CYP52A5A& 
CYP52A5B 


AAGAGGGCAGGGCTCAAGAG 

f *?fo rn mh- a*i\ I 


7581-97-M 


B 


CYP52A5A& 
CYP52A5B 


TCCATGTGAAGATCCCATCAC 

fSFO in MA- AH\ I 


4P-2 


F 


CYP52A8A 


CTTGAAGGCCGTGTTGAACG 

/CCA ¥T\ XI A. 1 


4M-1 


B 


CYPS2A8A 


CAGGATTTGTCTGAGTTGCCG 

tsfo m xja. <a\ 1 


3737-52F 


F 


P0X4A & 
P0X4B 


CCATTGCCTTGAGATACGCCATTGGTAG H 


3737-52B 


B 


POX4A & 
P0X4B 


AGCCTTGGTGTCGTTCTTTTCAACGG 

TSFO in xirv <o\ 1 


3737-53F 


F 


P0X5A 


TTGGGTTTG1 1 1 U 11 1 CCTGTGTCCG 
(SEQIDNO:53) 


3737-53B 


B 


POX5A 


CC 1 1 1 OACCTTCAATCTGGCGTAGACG H 
(SEQIDNO:54) 


F33 


F 


CPRA 


ouillOCTGAATACGCTGAAGGTGATG H 

rsEo in mo- w 1 


B63 
>737 : 133F " 

3737-133B " 


B 

F. 
B 


CPRA 

CPRA & _ 
CPRB 
CPRA & 
CPRB 


TGGAGCTGAACAACTCTCTCGTCTCGG 

I SEQ ID NO: 56) 

TTCCTCAACACGGACAGCGG ~ 

(SEQ ID NO: 57) 

AGTeAACCAGGTGTGGAACTCGTC 

(SEQ ID NO: 58) | 



F=Forward B=Backward 
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B. Design and Sy.th.si, fa, C on.p.« r DNA Tinfl,,. 

nucle„ad« rcM v, 1 „ lh .^ ve ^ aRNA ^^ raeDNAtmpiatefortiB ^ 
Synthesis of the c^petitor RNA is syn<hesized using PCR p™, m ,ha, a* beiween 46 »d 60 
■** m ' Mg,h - h ^ « * 4. primer pairs f„, fc synthesis of to 
compefitor DNA are shown in Tables 6 and 7 (SEQ ID NOS: 59 AND 60). 



10 ^^sz^^sss^xsr- 

AGAGGGCAGGGCTCAAGAG 
(SEQ ID NO: 59) 

15 1 . | (SEQ ID NO: 60) 



for 



Table 1. Primers for the synthesis of the QORT-PCR competitor RNA templates 



20 



25 



Primer 
Name 
3737-89C 



3737-89D 

758M37-A 

758M37-B 

7581-137-D 

758M37-C 



Direction 
F 

B~ 



B 



B 



Target 

CYP52A1A 

CYP52A1A 

CYPS2A2A 

CYP52A2A 

CYP52A3A 

CYP52A3A 



Sequence S'-3> 

AAGTTTTCGACGAGTACCC A 1 ° 

(SEQ ID NO: 61) 

AAUGCTTTAACO 1 U TCCAATCTGGTC 

AACATAGCTCTGGAGTGCTTCCAACC 
(SEQ ID NO: 62) 

(SEQ ID NO: 63) 

(SEQ ID NO: 64) 

(SEQ ID NO: 65) 

SEQ ID NO: 66 
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3737-50-C u 

3737-175C r" 

3737-175D B 

7581-97-A * 



7581-97-B I B 



4P-2/T7 I F 
4M-3/4M-1 I B 
3737-26-D t 
3737-26-C B 
3737-133C r 
3737-133D B 
3737-52-C J F" 
3737-52-D B 
3737-53-C r 



3737-53-D B 



CffiMM I GGATCCTAATACGACTCACTAI AGGGAGGCCTACU 
TTTGGTATCGCTACTCCGTTG 

CYP52A3 B 1 (SEQ ID NO: 67) =tt=T7? 

CYP52A3A ITTCCAGCCAGCACCGTCUAAUCAACAAGGAGTAC 
«ft AAGAAATCGTGTC 

C17>52,455 (SEQIDNO: 68) _ , 

CYP52D4A GGATCCTAATAC GACTCACTATAGGGAGGGCAGAG 
| CCGATCTATGTTGCGTCC 

(S EQIDNO: 69) _ 

CYP52D4A | TCATTGAATGCT TCCAGGAACCTCGCCACATCCATC 

1 GAGAACCGG 

( SEQIDNO: 70) 

CYP52A5A I GGATCCTAATA CGACTCACTATAGGGAGGAAGAGG 
& GCAGGGCTCAAGAG 

CYP52A5B ( SEQIDNO: 59) 

CTMMM TCCATGTGAAG ATCCCATCACGAGTGTGCCTCTTGC 

& CCAAAG 

C>7>52v45B 1 (SEQIDNO: 6 0) ^^^ AA 

GGATCCTAATACGACTCACTATAGGGAGGCTTGAA 



CYP52A8A , 

GGCCGTGTTGAACG 

(SEQIDNO: 71) _ 

CYP52A8A | CAGGATTTGTCTGAGTTGCCGCCTGATCAAGATAG 
GATCCTTGCCG 

(SEQ ID NO: 72) m 

~CPRA 1 GGATCCTAATACGACTCACTATAGGGAGGUUlilU 

CTGAATACGCTGAAGGTGATG 

(SE QIDNO: 73) , . 

~C PRA | TGGAGCTGAACAACTCTCTCGTCTCGGGTGGTCGA 

ATGGACCCTTGGTCAAG 

(SEQ ID NO: 74) _ 

CPRA& I GGATCCTAATACGACTCACTATAGGGAGGTTCC1XJ 

CPRB I AACACGGACAGCGG 

( SEQIDNO: 75) , 

CPRA& | AGTCAACCAGGTGTGGAACTCGTCGGTGGCAACAA 

CPRB I TGAAAAACACCAAG 

(SEQ ID NO: 76) 

POX4A & 1 GGATCCTAATACGACTCACTATAGGGAGGCCATTG 

POX4B I CCTTGAGATACGCCATTGGTAG 

(SEQ I D NO: 77) . 

-POX4A& I AGCCTTGGTGTCGTTCT1 1 1 CAACGGAAGGTGGTCT 
POX4B I CGATGGTGTGTTCAACC 

(SEQ ID NO: 78) m 

POX5A | GGATCCTAATACGACTCACTATAGGGAGGTTGGGT 
TTGTTTGTTTCCTGTGTCCG 
(SEQIDNO: 79) 



POX5A 



CCTTTGACCT 
CCGATCCACCACTTG 
(SEQ ID NO: 80) 



rCAATCTGGCGTAGACGCAGCACCA 



F=Forward B=Backword 
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r (SEQ ID NO: 59) contains the T7 promoter consensi 



The forward primer (SEQ ID NO: 59) contains the T7 promoter consensus sequence 
"GGATCCTAATACGA CTCACTATAGGG AGG" fused to the primer 7581-97-F sequence 
(SEQ ID NO: 47). The Reverse Primer (SEQ ID NO: 60) contains the sequence of primer 7581- 
97M (SEQ ID NO: 48) followed by the 20 bases of upstream sequence with a 18 base pair 
5 deletion between the two blocks of the CYP52A5 sequence. The forward primer was used with 
the corresponding reverse primer to synthesize the competitor DNA template. The primer pairs 
were combined in a standard Taq Gold polymerase PGR reaction according to the manufacturer's 
recommended conditions (Perkin-Elmer/Applied Biosystems, Foster City, CA). The PCR 
reaction mix contained a final concentration of 250 nM each primer and 10 ng C. tropicalis 
10 chromosomal DNA for template. The reaction mixture was placed in a 

thermocycler for 25 to 35 cycles using the highest annealing temperature possible during the 
PCR reactions to assure a homogeneous PCR product (in this case 62°C). The PCR products 
were either gel purified or filtered purified to remove un-incorporated nucleotides and primers. 
The competitor template DNA was then quantified using the (A 26 g/28o ) method. Primers used in 
15 QC-RT-PCR experiments for the synthesis of various competitive DNA templates are listed in 
Table 7 (SEQ ID NOS: 61-80). 

C. Synthesis of the Competitor RNA 

Competitor template DNA was transcribed In-Vitro to make the competitor RNA 
20 using the Megascript T7 kit from Ambion Biosciences (Ambion Inc., Austin, Texas). 250 
nanograms (ng) of competitor DNA template and the in-vitro transcription reagents are mixed 
according to the directions provided by the manufacturer. The reaction mixture was incubated 
for 4 hours at 37°C. The resulting RNA preparations were then checked by gel electrophoresis 
for the conditions giving the highest yields and quality of competitor RNA. This often required 
25 optimization according to the manufacturer's specifications. The DNA template was then 
removed using DNase I as described in the Ambion kit. The RNA competitor was then 
quantified by the(A 260 /28o) method. Serial dilution's of the RNA (1 ng/^1 to 1 femtogram (fg)/^l) 
: were made for use in the QC-RT-PCR reactions and the original stocks stored at -70°C. 



30 
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D. QC-RT-PCR Reactions 

QC-RT-PCR reactions were performed using rTth polymerase from Perkin- 
Elmer(Perkin-Elmer/Applied Biosystems, Foster City, CA) according to the manufacturer's 
recommended conditions. The reverse transcription reaction was performed in a 10 \il volume 
5 with a final concentrations of 200 \iM for each dNTP, 1 .25 units rTth polymerase, 1 .0 mM 
MnCl„ IX of the 1 OX buffer supplied with the Enzyme from the manufacturer, 
100 ng of total RNA isolated from a fermentor grown culture of C. tropicalis and 1.25 fiM of the 
appropriate reverse primer. To quantitate CYP52A5 expression in C. tropicalis an appropriate 
reverse primer was 7581-97M (SEQ ID NO: 48). Several reaction mixes were prepared for each 

10 RNA sample characterized. To quantitate CYP52A5 expression a series of 8 to 12 of the 

previously described QC-RT-PCR reaction mixes were aliquoted to different reaction tubes. To 
each tube 1 jjlI of a serial dilution containing from 100 pg to 100 fg CYP52A5 competitor RNA 
per jil was added bringing the final reaction mixtures up to the final volume of 1 0 jxl. The QC- 
RT-PCR reaction mixtures were mixed and incubated at 70°C for 15 min according to the 

15 manufacturer's recommended times for reverse transcription to occur. At the completion of the 
15 minute incubation, the sample temperature was reduced to 4°C to stop the reaction and 40 ^1 
of the PCR reaction mix added to the reaction to bring the total volume up to 50 |xl. The PCR 
reaction mix consists of an aqueous solution containing 0.3125 |iM of the forward primer 7581- 
97F (SEQ ID NO: 47), 3.125 mM MgCl 2 and IX chelating buffer supplied with the enzyme from 

20 Perkin-EImer. The reaction mixtures were placed in a thermocycler (Perkin-Elmer GeneAmp 
PCR System 2400, Perkin-Elmer/Applied Biosystems, Foster City, CA ) and the following PCR 
cycle performed: 94°C for 1 min. followed by 94°C for 10 seconds followed by 58°C for 40 
seconds for 17 to 22 cycles. The PCR reaction was completed with a final incubation at 58°C for 
2 min followed by 4°C. In some reactions where no detectable PCR products were produced the 

25 samples were returned the thermocycler for additional cycles, this process was repeated until 
enough PCR products were produced to quantify using HPLC. The number of cycles necessary 
to produce enough PCR product is a function of the amount of the target mRNA in the 100 ng of 
total cellular RNA. In cultures where the CYP52A5 gene is highly expressed there is sufficient 
CYP52A5 mRNA message present and less PCR cycles (<17) are required to produce 

30 quantifiable amount of PCR product. The lower the concentrations of the target mRNA present 
the more PCR cycles are required to produce a detectable amount of product. These QC-RT- 
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PCR procedures were applied to all the target genes listed in Table 5 using the respective primers 
indicated therein* 

E. HPLC Quantification 

Upon completion of the QC-RT-PCR reactions the samples were analyzed and 
quantitated by HPLC. Five to fifteen microliters of the QC-RT-PCR reaction mix was injected 
into a Waters Bio-Compatible 625 HPLC with an attached Waters 484 tunable detector. The 
detector was set to measure a wave length of 254 nm. The HPLC contained a Sarasep brand 
DNASep™ column (Sarasep, Inc., San Jose, CA) which was placed within the oven and the 
temperature set for 52 °C. The column was installed according to the manufacturer's 
recommendation of having 30 cm. of heated PEEK tubing installed between the injector and the 
column. The system was configured with a Sarasep brand Guard column positioned before the 
injector. In addition, there was a 0.22 nm filter disk just before the column, within the oven. 
Two Buffers were used to create an elution gradient to resolve and quantitate the PCR products 
from the QC-RT-PCR reactions. Buffer-A consists of 0.1 M tri-ethyl ammonium acetate 
(TEAA) and 5% acetonitrile (volume to volume). Buffer-B consists of 0.1 M TEAA and 25% 
acetonitrile (volume to volume). The QC-RT-PCR samples were injected into the HPLC and the 
linear gradient of 75% buffer-A/ 25% buffer-B to 45% buffer-A/ 55% B was run over 6 min at a 
flow rate of 0.85 ml per minute. The QC-RT-PCR product of the competitor RNA being 1 8 
base pairs smaller is eluted from the HPLC column before the QC-RT-PCR product from the 
CYP52A5 mRNA(U). The amount of the QC-RT-PCR products are plotted and quantitated with 
an attached Waters Corporation 745 data module. The log ratios of the amount of CYP52A5 
mRNA QC-RT-PCR product (U) to competitor QC-RT-PCR product (C), as measured by peak 
areas, was plotted and the amount of competitor RNA required to equal the amount of CYP52A5 
mRNA product determined. In the case of each of the target genes listed in Table 5, the 
competitor RNA contained fewer base pairs as compared to the native target mRNA and eluted 
before the native mRNA in a manner similar to that demonstrated by CYP52A5. HPLC 
quantification of the genes was conducted as above. .:, 
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EXAMPLE 1^ 
Evaluation of New Strains in Shake Flasks 

The CYP and CPR amplified strains such as strains HDC10, HDC15, HDC20 and 
HDC23 (Table 1) and H5343 were evaluated for diacid production in shake flasks. A single 
colony for each strain was transferred from a YPD agar plate into 5 ml of YPD broth and grown 
overnight at 30°C, 250 rpm. An inoculum was then transferred into 50 ml of DCA2 medium 
(Chart) and grown for 24 h at 30°C, 300 rpm. The cells were centrifuged at 5000 rpm for 5 min 
and resuspended in 50 ml of DCA3 medium (Chart) and grown for 24 h at 30°C, 300 rpm. 3% 
oleic acid w/v was added after 24 h growth in DCA3 medium and the cultures were allowed to 
bioconvert oleic acid for 48 h. Samples were harvested and the diacid and monoacid 
concentrations were analyzed as per the scheme given in Figure 35. Each strain was tested in 
duplicate and the results shown in Table 8 represent the average value from two flasks. 
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Table 8. Byconversion of oleic acid by different recombinant strains of Candida tropicalis 



Strain 


Conversion to 
Oleic diacid 
(%) 


Specific Conversion 
(g diacid/g biomass 


H5343 


41.9 


0.53 


HDC 10-2 


50.5 


0.85 


HDC15 


54.4 


0.85 


HDC 20-1 


45.1 


0.72 


HDC 20-2 


45,3 


0.58 


HDC 23-2 


55.2 


0.84 


HDC 23-3 


58.8 


0.89 



EXAMPLE 13 

Cloning and Characterization of C tropicalis 20336 Cytochrome P450 
Monooxygenase (CYP) and Cytochrome P450 NADPH Oxidoreductase (CPR) Genes 

To clone CYP and CPR genes several different strategies were employed. 
Available CYP amino acid sequences were aligned and regions of similarity were observed 
(Figure 4). These regions corresponded to described conserved regions seen in other cytochrome 
P450 families (Goeptar et al., supra and Kalb et al. supra). Proteins from eight eukaryotic 
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cytochrome P450 families share a segmented region of sequence stnulanty. One legion 
conespondedtomeHRadon^eo^^^^^^^^ ( 
terminus which is '^^^^^^v,,*^^^ 

^noffte.he.ix^ugmtobeinvo.vedmsubsUaterecognidonCFigur.^). Deg raetate 
5 ohgonucleotide primers corresponding to these highly conserved regions of the CYPS2 gene 

tom pUfyDNAiragmen B „f Cffger^fi^c^^^, ^ ^ ^ 
drscrefc PCR fragments were then used as ProbesU, isolate fcll-lengmC? genes fix>m the C. 
W^O^genomichW. !n a few insUnces oligonucleotide primers corresponding to 
hrghly conserved regions we* direcdy used as probes U, isolate mll-lengfl, CK> ge^s fom 

genomic libranes. In the case of CW? a iWr«i™„,.„ t. i_ 

^ a net eroIogous probe based upon the known DNA 

C/vf gene. 



15 A. Cloning of the CP* Gene from Cropicalis 20336 

1) Cloning of the CPA4 Allele 

Approximately 25,000 phage particles from the first genomic library of C 

^ ^^-'^^M^^mentfiompiasmia^RH, 
( S «^oe, ^.,8,^0^,0:894.8080^,,^^^ 

— "mostofmeC^/^joc^gene. Five clones that hybridized to me probe 
were isolated and the plasmid DNA from these lambda clones was rescued and characterized by 
~n enzy™ analysis. The restriction enzyme analysis suggested tha, all five clones were 
.denncal bu, „ was no, clear that a complete CPS gene was present. 

P ^^ Sis ^^ , ^«-nta-facompleteC«genewasp reS e„ tillanvof 
mefiveclones. Degenerate primers were prepared for highly conserved regions of know, CPS 

reference), Figure 4). Two Primers were synthesized for the FMN binding region (FMN1 SEQ 
IDNO: 16andFMN2 SEOIDNn. \t\ n" • ■ "Viwwi.aty 
"~ = ' •.- ^ A bKQ ID NO 17). One pnmer was synthesized for the FAD binding 
reg,on (FAD, SEQ ID NO: 1^™^^ NADPH bind™ • ™ 
ID Nfi' i ,t ui ^ -m. mg reg,on (NADPH, SEQ 

ID NO. 9 ,n-ab,e4). These four primers were used in PCR amplification experiment using, 
a template plasmrd DNA isolated from four of the Sve ^ ^ ^ ^ ^ 
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IDNOS: 16and 1 7) and FAD (SEQ ID NO: 18) primers served as forward primers and the 
NADPH primer (SEQ ID NO: 19) as the reverse primer in the PCR reactions. When different 
combinations offorward and reverse primers were used, no PGR products were obtained from 
any of the plasmids. However, all primer combinations amplified expected size products with a 
5 plasnudcontairnng^^ The most likely reason for 

the failure ofthe primer pairs to amplify a product, was that all four of clones contained a 
truncated CPR gene. One of the four clones (pHKMl) was sequenced using the Triplex 5' 

10 the NADPHbindingsitedescribedabove. The NADPH primer (SEQ ID NO: 19) failed to yield 
m *^tom*miMa 0 ^^K*m**. Sequences obtained with Tnplex 
primers were compared withC tropical* 750 CP. sequence using the MacVector™ program 
(Oxford Molecular Group, Campbell, CA). Sequence obtained with the Triplex 3' primeKSEQ 
ID NO- 31) showed similarity to an internal sequence ofthe C. tropicalis 750 CPR gene 
13 v,^**^"^**^^**™**^ P HKMlhada3.8 
kb insert which included a 1 .2 kb coding region ofthe CPR gene accompanied by 2.5 kb of 
upstream DNA (Figure 5). Approximately 0.85 kb ofthe 20336 CPR gene encoding the C- 
terminal portion ofthe CPR protein is missing from this clone. 

Since the first Clontech library yielded only a truncated CPR gene, the second 
20 librarypreparedbyClontechwasscreenedtoisol^ Three putative 

CPR clones were obtained. The three clones, having inserts in the range of 5-7 kb, were 
designated pHKM2, pHKM3 and pHKM4. All three were characterized by PCR using the 
degenerate primers described above. Both P HKM2 and P HKM4 gave PCR products with two 
setsofintemalprimers. P HKM3 gave a PCR product only with the FAD (SEQ ID NO: 18) and 
25 NADPH (SEQ ID NO: 19) primers suggesting that this clone likely contained a truncated CPR 
gene All three plasmids were partially sequenced using the two Triplex primers and a third 
primer whose sequence was selected from the DNA sequence near the truncated end ofthe CPR 

genepresent in pHKMl. This analysis confirmed that both pHKM2 & 4 have sequences that 

• • overlap pHKMl and that both contained the 3' region of CPR gene that is missing from 
30 pHKMl. Portions of inserts from pHKMl and P HKM4 were sequenced and a full-length CPR 
gene was identified. Based on the DNA sequence and PCR analysis, it was concluded that 
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pHKMl contained theputative promoter region and 1.2 kbofseJeL encoding a portion (S> 
end)ofa^gen, pHKM4 had 1.1 kb of DNA that overlapped pHKMl and contained the 
remainder (3' end) of a CPR gene along with a downstream untranslated region (Figure 6) 
Together these two plasmids contained a complete CPRA gene with an upstream promoter 
re gl on. CPRA is 4206 nucleotides in length (SEQ ID NO: 81) and includes a regulatory region 
and a protein coding region (defined by nucleotides 1006-3042) which is 2037 base pairs fa 
length and codes for a putative protein of 679 amino acids (SEQ ID NO: 83) (Figures 13 and 
14). In Figure 13, the asterisks denote conserved nucleotides between CPRA and CPRB bold 
denotes protein coding nucleotides, and the start and stop codons are underlined. The CPRA 
protem, when analyzed by the protein alignment program of the GeneWorks™ software package 
(OxfordMolecular Group, Campbell, CA), showed extensive homology to CP* proteins from C 
tropicalis 750 and C. maltosa. 



2) Cloning of the CPRB Allele 
15 To Cone the second CPRB ai.de, th. third genomic library, prepared by Henke. 

was soeened using DNA fragments from pHKMl and pHKM4 as probes. Five cIones ^ ' 
obtarned and these were sentenced with the three interna! primers used U, sequence CPRA 
Thesepnmers were designated PRK1J3 (SEQ ID NO: 20) , PRKI.F5 (SEQ ID NO- 21) and 
P W C4J $ 20(SEQID N O:22)Crab 1 e 4 ). and thet^de primers (M13 -20 m dT3 

20 t^^BformepoIylinkerregionp^m^pBK^clontagvecu.r. Sequence 
analysis suggested tha, four of these clones, designated pHKM5 to 8, contained inserts which 

reg-onwheremeidermtywasverylugh. However, there were significant differences in the 5' 

a»d3unt^ 1 a,edregi„ K .TOssug 8 ested ta ,u,ef,m.c,one»^fte a , I e,e to G^ The 
plasmid was designated pHKM9 (Figure 7, and a 4,4 kb region of this plasmid was sequenced 
and the _ana.ys,s of mis sequence confirm* the presence of the CPRB. a,,e,e (SEQ ,D NO- 82, 

*69><F.gure,3). rheaminoacidsequenceofmecmproteinissetfonhin SEQIDNO-84 
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B. Cloning of C tropicalis 20336 (CYP) Genes 
1) Cloning of CYP52A2A, CYP52A3A & 3B and CYP52ASA & 5B 
Clones carrying CYP52A2A, A3A, A3B, ASA and A5B genes were 
isolated from the first and second Clontech genomic libraries using an oligonucleotide probe 
5 (HemeBl, SEQ ID NO: 27) whose sequence was based upon the amino acid sequence for the 
highly conserved heme binding region present throughout the CYP52 family. The first and 
second libraries were converted to the plasmid form and screened by colony hybridizations 
using the HemeBl probe (SEQ ID NO: 27) (Table 4). Several potential clones were isolated and 
the plasmid DNA was isolated from these clones and sequenced using the HemeBl 
10 oligonucleotide (SEQ ID NO: 27) as a primer. This approach succeeded in identifying five 

CYP52 genes. Three of the CYP genes appeared unique, while the remaining two were classified 
as alleles. Based upon an arbitrary choice of homology to CYP52 genes from Candida maltosa, 
these five genes and corresponding plasmids were designated CYPS2A2A (pPA15 [Figure 26]), 
CYP52A3A (pPA57 [Figure 29]), CYP52A3B (pPA62 [Figure 30]), CYP52A5A (pPAL3 [Figure 
31]) and CYP52A5B (pPA5 [Figure 32]). The complete DNA sequence including regulatory and 
protein coding regions of these five genes was obtained and confirmed that all five were CYP52 
genes (Figure 15). In Figure 15, the asterisks denote conserved nucleotides among the CYP 
genes. Bold indicates the protein coding nucleotides of the CYP genes, and the start and stop 
codons are underlined. The CYP52A2A gene as represented by SEQ ID NO: 86 has a protein 
coding region defined by nucleotides 1 199-2767 and the encoded protein has an amino acid 
sequence as set forth in SEQ ID NO: 96. The CYP52A3A gene as represented by SEQ ID NO: 
88 has a protein encoding region defined by nucleotides 1 126-2748 and the encoded protein has 
an amino acid sequence as set forth in SEQ ID NO: 98. The CYP52A3B gene as represented by 
SEQ ID NO: 89 has a protein coding defined by nucleotides 913-2535 and the encoded protein 
has an amino acid sequence as set forth in SEQ ID NO: 99. The CYP52A5A gene as represented 
by SEQ ID NO: 90 has a protein coding region defined by nucleotides 1 103-2656 and the 
encoded protein has an amino acid sequence as set forth in SEQ ID NO: 100. The CYP52A5B 
gene as represented by SEQ ID NO: 91 has a protein coding region defined by nucleotides 1 142- 
2695 and the encoded protein has ah amino acid sequence as set forth in SEQ ID NO: 101 . 



25 
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2) Cloning of CYP52A1A and CYP52A8A 

CYP52A1A and CYP52A8A genes were isolated from the third genomic libnuy 
using PCR fragments as probes. The PCR fragment probe for CYP52A1 was generated after 
PCR amplification of 20336 genomic DNA with oligonucleotide primers that were designed to 
amplify a region from the Helix I region to the HR2 region using all available CYP52 genes from 
National Center for Biotechnology Information. Degenerate forward primers UCupl (SEQ ID 
NO: 23) and UCup2 (SEQ ID NO: 24) were designed based upon an amino acid sequence (- 
RDTTAG-) from the Helix I region (Table 4). Degenerate primers UCdownl (SEQ ID NO: 25) 
and UCdown2 (SEQ ID NO: 26) were designed based upon an amino acid sequence (-GQQFAL- 
) from the HR2 region (Table 4). For the reverse primers, the DNA sequence represents the 
reverse complement of the corresponding amino acid sequence. These primers were used in 
pairwise combinations in a PCR reaction with Stoffel Tag DNA polymerase (Perkin-Elmer 
Cetus, Foster City, CA) according to the manufacturer's recommended procedure. A PCR 
product of approximately 450 bp was obtained. This product was purified from agarose gel 
using Gene-clean™ (Bio 101, LaJolla, CA) and ligated to the pTAG™ vector (Figure 17) (R&D 
systems, Minneapolis, MN) according to the recommendations of the manufacturer. No 
treatment was necessary to clone into pTAG because it employs the use of the TA cloning 
technique. Plasmids from several transformants were isolated and their inserts were 
characterized. One plasmid contained the PCR clone intact. The DNA sequence of the PCR 
fragment (designated 44CYP3, SEQ ID NO: 107) shared homology with the DNA sequences for 
the CYP52A1 gene of C. maltosa and the CYP52A3 gene of C. tropicalis 750. This fragment 
was used as a probe in isolating the C tropicalis 20336 CYP52A1 homolog. The third genomic 
library was screened using the 44OT3 PCR probe (SEQ ID NO: 107) and a clone (pHKMl 1) 
that contained a full-length CYP52 gene was obtained (Figure 8). The clone contained a gene 
havmg regulatory and protein coding regions. An open reading frame of 1572 nucleotides 
encoded a CYP52 protein of 523 amino acids (Figures 15 and 16 ). This CYP52 gene was 
designated CYP52A1A (SEQ ID NO: 85) since its putative amino acid sequence (SEQ ID NO- 
95) was most similar to the CYP52A1 protein of C. maltosa. The protein coding region of the 
CYP52A1A gene is defined'by nucleotides ! 1 77-2748 of SEQ ID NO: 85. 

A similar approach was taken to clone CYP52A8A. A PCR fragment probe for 
CYP52A8 was generated using primers for highly conserved sequences of CYP52A3, CYP52A2 
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and CYP52A5 genes of C. tropicalis 750. The reverse primer (primer 2.3.5.M) (SEQ ID NO: 29) 
was designed based on the highly conserved heme binding region (Table 4). The design of the 
forward primer (primer 2,3,5,P) (SEQ ID NO: 28) was based upon a sequence conserved near 
the N-terminus of the CYP52A3, CYP52A2 and CYP52A5 genes from C. tropicalis 750 gable 
4). Amplification of 20336 genomic DNA with these two primers gave a mixed PCR product. 
One amplified PCR fragment was 1006 bp long (designated DCA1002). The DNA sequence for 
this fragment was determined and was found to have 85% identity to the DNA sequence for the 
CYP52D4 gene of C tropicalis 750. When this PCR product was used to screen the third 
genomic library one clone (pHKM12) was identified that contained a full-length CYP52 gene 
along with 5' and 3' flanking sequences (Figure 9). The CYP52 gene included regulatory and 
protein coding regions with an open reading frame of 1539 nucleotides long which encoded a 
putative CYP52 protein of 512 amino acids (Figures 15 and 16 ). This gene was designated as 
CYP52A8A (SEQ ID.NO: 92) since its amino acid sequence (SEQ ID NO: 102) was most 
similar to the CYP52A8 protein of C. maltosa. The protein coding region of the CYP52A8A gene 
is defined by nucleotides 464-2002 of SEQ ID NO: 92. The amino acid sequence of the 
CYP52A8A protein is set forth in SEQ ID NO: 102. 

3) Cloning of CYP52D4A 

The screening of the second genomic library with the HemeBl (SEQ ID NO: 27) 
primer (Table 4) yielded a clone carrying a plasmid (pPAl 8) that contained a truncated gene 
having homology with the CYP52D4 gene of C. maltosa (Figure 33). A 1 .3 to 1 .5-kb EcoBl- 
Sstl fragment from pPAl 8 containing part of the truncated CYP gene was isolated and used as a 
probe to screen the third genomic library for a full length CYP52 gene. One clone (pHKM 13) 
was isolated and found to contain a full-length CYP gene with extensive 5' and 3' flanking 
sequences (Figure 10). This gene has been designated as CYP52D4A (SEQ ID NO: 94) and the 
complete DNA including regulatory and protein coding regions (coding region defined by 
nucleotides 767-2266) and putative amino acid sequence (SEQ ID NO: 104) of this gene is 
shown in Figures 1 5 and 1 6. CYP52D4A (SEQ ID NO: 94) shares the greatest homology with 
the CYP52D4 gene of C. maltosa. 
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4) Cloning of CYP52A2B and CYP52A8B 

A mixed probe containing CYP52A1A, A2A, A3A, D4A t ASA and A8A genes was 
used to screen the third genomic library and several putative positive clones were identified. 
Seven of these were sequenced with the degenerate primers Cyp52a (SEQ ID NO: 32), Cyp52b 

5 (SEQ ID NO: 33), Cyp52c (SEQ ID NO: 34) and Cyp52d (SEQ ID NO: 35) shown in Table 4. 
These primers were designed from highly conserved regions of the four CYP52 subfamilies, 
namely CYP52A, B 9 C&D. Sequences from two clones, pHKM14 and pHKMIS (Figures 1 1 and 
12), shared considerable homology with DNA sequence of the C tropicalis 20336 CYP52A2 
and CYP52A8 genes, respectively. The complete DNA (SEQ ID NO: 87) including regulatory 

10 and protein coding regions (coding region defined by nucleotides 1072-2640) and putative amino 
acid sequence (SEQ ID NO: 97) of the CYP52 gene present in pHKM14 suggested that it is 
CYP52A2B (Figures 15 and 16). The complete DNA (SEQ ID NO: 93) including regulatory and 
protein coding regions (coding region defined by nucleotides 1017-2555) and putative amino 
acid sequence (SEQ ID NO: 103) of the CYP52 gene present in pHKMIS suggested that it is 

15 CYP52A8B (Figures 15 and 16). 



Genes whose transcription is turned on by the presence of selected fatty 
acid or alkane substrates have been identified using the QC-RT-PCR assay. This assay was used 
to measure (CYP) and (CPR) gene expression in fermentor grown cultures C. tropicalis ATCC 
20962. This method involves the isolation of total cellular RNA from cultures of C tropicalis 

25 and the quantification of a specific mRNA within that sample through the design and use of 
sequence specific QC-RT-PCR primers and an RNA competitor. Quantification is achieved 
through the use of known concentrations of highly homologous competitor RNA in the QC-RT- 
PCR reactions. The resulting QC-RT-PCR amplified cDNA's are separated and quantitated 
through the use of ion pairing reverse phase HPLC. This assay was used to characterize the 

30 expression of CYP52 genes of C. tropicalis ATCC 20962 in response to various fatty acid and 
alkane substrates. Genes which were induced were identified by the calculation of their mRNA 
concentration at various times before and after induction. Figure 1 8 provides an example of 
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Identification of CYP and CPR Genes Induced by 
Selected Fatty Acid and Alkane Substrates 



20 
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how the concentration of mRNA for CYP52A5 can be calculated using the QC-RT-PCR assay. 
The log ratio of unknown (U) to competitor product (C) is plotted versus the concentration of 
competitor RNA present in the QC-RT-PCR reactions. The concentration of competitor which 
results in a log ratio of U/C of zero, represents the point where the unknown messenger RNA 

5 concentration is equal to the concentration of the competitor. Figure 18 allows for the 

calculation of the amount of CYP52A5 message present in 100 ng of total RNA isolated from 
cell samples taken at 0, 1, and 2 hours after the addition of Emersol® 267 in a fermentor run. 
From this analysis, it is possible to determine the concentration of the CYP52A5 mRNA present 
in 1 00 ng of total cellular RNA. In the plot contained in Figure 1 8 it takes 0.46 pg of competitor 

10 to equal the number of mRNA's of CYP52A5 in 100 ng of RNA isolated from cells just prior 
(time 0) to the addition of the substrate, Emersol® 267. In cell samples taken at one and two 
hours after the addition of Emersol® 267 it takes 5.5 and 8.5 pg of competitor RNA, 
respectively. This result demonstrates that CYP52A5 (SEQ ID NOS: 90 and 91) is induced more 
than 18 fold within two hours after the addition of Emersol® 267. This type of analysis was 

15 used to demonstrate that CYP52A5 (SEQ ID NO: 90 and 91) is induced by Emersol® 267. 
Figure 19 shows the relative amounts of CYP52A5 (SEQ ID NOS: 90 and 91) expression in 
fermentor runs with and without Emersol® 267 as a substrate. The differences in the CYP52A5 
(SEQ. ID NOS: 90 and 91) expression patterns are due to the addition of Emersol® 267 to the 
fermentation medium. 

20 This analysis clearly demonstrates that expression of CYP52A5 (SEQ ID NOS: 90 

and 91) in C. tropicalis 20962 is inducible by the addition of Emersol® 267 to the growth 
medium. This analysis was performed to characterize the expression of CYP52A2A (SEQ ID 
NO: 86) , CYP52A3AB (SEQ ID NOS: 88 and 89) . CYP52A8A (SEQ ID NO: 92) , CYP52A1A 
(SEQ ID NO: 85), CYP52D4A (SEQ ID NO: 94) and CPRB (SEQ ID NO: 82) in response to the 

25 presence of Emersol® 267 in the fermentation medium (Figure 20). The results of these 

analysis' indicate, that like the CYP52A5 gene (SEQ ID NOS: 90 and 91) of C tropicalis 20962, 
the CYP52A2A gene (SEQ ID NO: 86) is inducible by Emersol® 267. A small induction is 
observed for CYP52A1A (SEQ ID NO: 85) and CYP52A8A (SEQ ID NO: 92). In contrast, any 
induction for CYP52D4A (SEQ ID NO: 94), CYP52A3A (SEQ ID NO: 88), CYP52A3B (SEQ ID 

30 NO: 89) is below the level of detection of the assay. CPRB (SEQ ID NO: 82) is moderately 
induced by Emersol® 267, four to five fold. The results of these analysis are summarized in 
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Figure 20. Figure 34 provides an example of selective induction of CYP52A genes. When pure 
fatty acid or alkanes are spiked into a fermentor containing C. tropicalis 20962 or a derivative 
thereof, the transcriptional activation of CYP52A genes was detected using the QC-RT-PCR 
assay. Figure 34 shows that pure oleic acid (CI 8:1) strongly induces CYP52A2A (SEQ ID NO: 

5 86) while inducing CYP52A5 (SEQ ID NOS: 90 and 91). In the same fermentor addition of pure 
alkane (tridecane) shows strong induction of both CYP52A2A (SEQ ID NO: 86) and CYP52A1A 
(SEQ ID NO: 85). However, tridecane did not induce CYP52A5 (SEQ ID NOS: 90 and 91) . In 
a separate fermentation using ATCC 20962, containing pure octadecane as the substrate, 
induction of CYP52A2A, CYP52A5A and CYP52A1A is detected (see Figure 36). The foregoing 

10 demonstrates selective induction of particular CYP genes by specific substrates, thus providing 
techniques for selective metabolic engineering of cell strains. For example, if tridecane 
modification is desired, organisms engineered for high levels of CYP52A2A (SEQ ID NO: 86) 
and CYP 52 Al A (SEQ ID NO: 85) activity are indicated. If oleic acid modification is desired, 
organisms engineered for high levels of CYP52A2A (SEQ ID NO: 86) activity are indicated. 

15 

EXAMPLE 15 

Integration of Selected CYP and CPR Genes 
into the Genome of Candida tropicalis 

20 In order to integrate selected genes into the chromosome of C tropicalis 20336 or 

its descendants, there has to be a target DNA sequence, which may or may not be an intact gene, 
into which the genes can be inserted. There must also be a method to select for the integration 
event. In some cases the target DNA sequence and the selectable marker are the same and, if so, 
then there must also be a method to regain use of the target gene as a selectable marker following 

25 the integration event. In C tropicalis and its descendants, one gene which fits these criteria is 
URA3A, encoding orotidine-5*-phosphate decarboxylase. Using it as a target for integration, urcr 
variants of C. tropicalis can be transformed in such a way as to regenerate a URA* genotype via 
homologous recombination (Figure 21). Depending upon the design of the integration vector, 
one or more genes can be integrated into the genome at the same time. Using a split URA3A 

30 gene oriented as shown in Figure 22, tibinologouS integration would yield at least one copy of the 
gene(s) of interest which are inserted between the split portions of the URA3A gene. Moreover, 
because of the high sequence similarity between URA3A and URA3B genes, integration of the 



-59- 




construct can occur at both the URA3A and URA3B loci. Subsequently, an oligonucleotide 
designed with a deletion in a portion of the URA gene based on the identical sequence across 
both the URA3A and URA3B genes, can be utilized to yield C. tropicalis transformants which 
are once again urcf but which still cany one or more newly integrated genes of choice (Figure 
5 21). urcf variants of C. tropicalis can also be isolated via other methods such as classical 
mutagenesis or by spontaneous mutation. Using well established protocols, selection of urcr 
strains can be facilitated by the use of 5-fluoroorotic acid (5-FOA) as described, e.g., in Boeke et 
al., Mol Gen. Genet. 197:345-346, (1984), incorporated herein by reference. The utility of this 
approach for the manipulation of C tropicalis has been well documented as described, e.g., in 
10 Picataggioetal.,M>/. andCell Biol 11:4333-4339(1991); Rohreretal.,^/. Microbiol 

Biotechnol 36:650-654 (1992); Picataggio et al., Bio/Technology 10:894-898 (1992); U.S. Patent 
No. 5,648,247; U.S. Patent No. 5,620,878; U.S. Patent No. 5,204,252; U.S. Patent No. 
5,254,466, all of which are incorporated herein by reference. 

15 A. Construction of a URA Integration Vector, pURAin. 

Primers were designed and synthesized based on the 1712 bp sequence of the 
URA3A gene of C tropicalis 20336 (see Figure 23). The nucleotide sequence of the URA3A 
gene of C. tropicalis 20336 is set forth in SEQ ID NO: 105 and the amino acid sequence of the 
encoded protein is set forth in SEQ ID NO: 106. URA3A Primer Set #la (SEQ ID NO: 9) and 

20 #lb (SEQ ID NO: 10) (Table 4) was used in PCR with C. tropicalis 20336 genomic DNA to 
amplify URA3A sequences between nucleotide 733 and 1688 as shown in Figure 23. The 
primers are designed to introduce unique 5* Ascl and 3' Pad restriction sites into the resulting 
amplified URA3A fragment. Ascl and Pad sites were chosen because these sites are not present 
within CYP or CPR genes identified to date. URA3A Primer Set #2 was used in PCR with C 

25 tropicalis 20336 genomic DNA as a template, to amplify URA3A sequences between nucleotide 
9 and 758 as shown in Figure 23. URA3A Primer set #2a (SEQ ID NO: 1 1) and #2b (SEQ ID 
NO: 12) (Table 4) was designed to introduce unique 5' Pad and 3 1 Pmel restriction sites into the 
resulting amplified URA 3 A fragment. The Pmel site is also not present within CYP and CPR 
Series identified to date; r PCR ^grherits of the URA 3A gene were purified, restricted with^scl, 

30 Pad and Pmel restriction enzymes and ligated to a gel purified, QiaexII cleaned Ascl-Pmel 
digest of plasmid pNEB193 (Figure 25) purchased from New England Biolabs (Beverly, MA). 
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The ligation was performed with an equimolar number of DNA termini at 16 °C for 16 hr using 
T4 DNA ligase (New England Biolabs). Ligations were transformed into E. coli XL 1 -Blue cells 
(Stratagene, LaJolla, CA) according to manufacturers recommendations. White colonies were 
isolated, grown, plasmid DNA isolated and digested with;4$cI-ProeI to confirm insertion of the 
5 modified URA3A into pNEB 1 93. The resulting base integration vector was named pURAin 
(Figure 24). 

B. Amplification of CYP52A2A, CYP52A3A, CYPS2A5A and 
CPRB from C tropicalis 20336 Genomic DNA 

10 The genes encoding CYP52A2A, (SEQ ID NO: 86) and CYP52A3A (SEQ ID NO: 

88) from C. tropicalis 20336 were amplified from genomic clones (pPA15 and pPA57, 
respectively) (Figures 26 and 29) via PCR using primers (Primer CYP 2A#1, SEQ ID NO: 1 and 
Primer CYP 2A#2, SEQ ID NO: 2 for CYP52A2A) (Primer CYP 3A#1, SEQ ID NO: 3 and 
Primer CYP 3A#2, SEQ ID NO: 4 for CYP52A3A) to introduce Pad cloning sites. These PCR 

15 primers were designed based upon the DNA sequence determined for CYP52A2A (SEQ ID NO: 
86) (Figure 15). The Amplify Gold PCR kit (Perkin Elmer Cetus, Foster City, CA) was used 
according to manufacturers specifications. The CYP52A2A PCR amplification product was 2,230 
base pairs in length , yielding 496 bp of DNA upstream of the CYP52A2A start codon and 168 bp 
downstream of the stop codon for the CYP52A2A ORF. The CYP52A3A PCR amplification 

20 product was 2154 base pairs in length, yielding 437bp of DNA upstream of the CYP52A3A start 
codon and 97bp downstream of the stop codon for the CYP52A3A ORF. The CYP52A3A PCR 
amplification product was 2154 base pairs in length, yielding 437bp of DNA upstream of the 
CYP52A3A start codon and 97bp downsteam of the stop codon for the CYP52A3A ORF. 

The gene encoding CYP52A5A (SEQ ID NO: 90) from C. tropicalis 20336 was 

25 amplified from genomic DNA via PCR using primers (Primer CYP 5A#1 , SEQ ID NO: 5 and 
Primer CYP 5A#2, SEQ ID NO: 6) to introduce Pad cloning sites. These PCR primers were 
designed based upon the DNA sequence determined for CYP52A5A (SEQ ID NO: 90) . The 
Expand Hi-Fi Taq PCR kit (Boehringer Mannheim, Indianapolis, IN) was used according to 
-. manufacturers specifications.- v fhe CYP52ASA PCR amplification product was 3,298 base pairs 

30 in length. 
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The gene encoding CPRB (SEQ ID NO: 82) from C. tropicalis 20336 was 
amplified from genomic DNA via PCR using primers (CPR B#l, SEQ ID NO: 7 and CPR B#2, 
SEQ ID NO: 8) based upon the DNA sequence determined for CPRB (SEQ ID NO: 82) (Figure 
13). These primers were designed to introduce unique Pad cloning sites. The Expand Hi-Fi 
5 Tag PCR kit (Boehringer Mannheim, Indianapolis, IN) was used according to manufacturers 
specifications. The CPRB PCR product was 3266 bp in length, yielding 747 bp pf DNA 
upstream of the CPRB start codon and 493 bp downstream of the stop codon for the CPRB ORF. 
The resulting PCR products were isolated via agarose gel electrophoresis, purified using QiaexII 
and digested with Pad. The PCR fragments were purified, desalted and concentrated using a 
10 Microcon 100 (Amicon, Beverly, MA). 

The above described amplification procedures are applicable to the other genes 
listed in Table 5 using the respectively indicated primers. 

C. Cloning of CYP and CPR Genes into pURAin. 

15 The next step was to clone the selected CYP and CPR genes into the pURAin 

integration vector. In a preferred aspect of the present invention, no foreign DNA other than that 
specifically provided by synthetic restriction site sequences are incorporated into the DNA which 
was cloned into the genome of C. tropicalis, i.e., with the exception of restriction site DNA only 
native C. tropicalis DNA sequences are incorporated into the genome. pURAin was digested 

20 with Pad, Qiaex II cleaned, and dephosphorylated with Shrimp Alkaline Phosphatase (SAP) 
(United States Biochemical, Cleveland, OH) according the manufacturer's recommendations. 
Approximately 500 ng of Pad linearized pURAin was dephosphorylated for 1 hr at 37 °C using 
SAP at a concentration of 0.2 Units of enzyme per 1 pmol of DNA termini. The reaction was 
stopped by heat inactivation at 65 °C for 20 min. 

25 The CYP52A2A Pad fragment derived using the primer shown in Table 4 was 

ligated to plasmid pURAin which had also been digested with Pad. Pad digested pURAin was 
dephosphorylated, and ligated to the CYP52A2A ULTMA PCR product as described previously. 
1116 ^'^ ^^e w^t^fonnedintof. co/iXLl Blue MRF' (Stratagene) and 2 resistant 
coionies were selected and I scremed'ior con^ cohstnicts which should contain vector sequence, 

30 the inverted URA3A gene, and the amplified CYPS2A2A gene (SEQ ID NO: 86) of 20336. Ascl- 
Pmel digestion identified one of the two constructs, plasmid pURA2in, as being correct (Figure 
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27). This plasmid was sequenced and compared to CYP52A2A (SEQ ID NO: 86) to confirm that 
PCR did not introduce DNA base changes that would result in an amino acid change. 

Prior to its use, the CPRB Pad fragment derived using the primers shown in 
Table 4 was sequenced and compared to CPRB (SEQ ID NO: 82) to confirm that PCR did not 

5 introduce DNA base pair changes that would result in an amino acid change. Following 
confirmation, CPRB (SEQ ID NO: 82) was ligated to plasmid pURAin which had also been 
digested with Pad. Pad digested pURAin was dephosphoiylated, and ligated to the CPR 
Expand Hi-Fi PCR product as described previously. The ligation mixture was transformed into 
K coli XL1 Blue MRF (Stratagene) and several resistant colonies were selected and screened 

10 for correct constructs which should contain vector sequence, the inverted URA3A gene, and the 
amplified CPRB gene (SEQ ID NO: 82) of 20336. AscI-P/nel digestion confirmed a successful 
construct, pURAREDBin. 

In a manner similar to the above, each of the other CYP and CPR genes disclosed 
herein are cloned into pURAin. Pad fragments of these genes, whose sequences are given in 

15 Figures 13 and 15, are derivable by methods known to those skilled in the art. 

1) Construction of Vectors Used to Generate HDC 20 and HDC 23 

A previously constructed integration vector containing CPRB (SEQ ID NO: 82), 
pURAREDBin, was chosen as the starting vector. This vector was partially digested with Pad 

20 and the linearized fragment was gel-isolated. The active Pad was destroyed by treatment with 
T4 DNA polymerase and the vector was re-ligated. Subsequent isolation and complete digestion 
of this new plasmid yielded a vector now containing only one active Pad site. This fragment 
was gel-isolated, dephosphorylated and ligated to the CYP52A2A Pad fragment. Vectors that 
contain the CYP52A2A (SEQ ID NO: 86) and CPRB (SEQ ID NO: 82) genes oriented in the 

25 same direction, pURAin CPR 2A S, as well as opposite directions (5 1 ends connected), pURAin 
CPR 2A O, were generated. 

D. Confirmation of CYP Integration (Figure 21 for Integration Scheme) 
into the Genome of G tropicalis 

30 Based on the construct, pURA2in, used to transform H5343 ura~ 9 a scheme to 

detect integration was devised. Genomic DNA from transformants was digested with Dra III 
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and Spe I which are enzymes that cut within the URA3A, and URA3B genes but not within the 
integrated CYP52A2A gene. Digestion of genomic DNA where an integration had occurred at 
the URA3A or URA3B loci would be expected to result in a 3.5 kb or a 3.3 kb fragment, 
respectively (Figure 28). Moreover, digestion of the same genomic DNA with Pad would yield 
a 2.2 kb fragment characteristic for the integrated CYP52A2A gene (Figure 28). Southern 
hybridizations of these digests with fragments of the CYP52A2A gene were used to screen for 
these integration events. Intensity of the band signal from the Southern using Pad digestion was 
used as a measure of the number of integration events, ((i.e. the more copies of the CYP52A2A 
gene (SEQ ID NO: 86) which are present, the stronger the hybridization signal)). 

C. tropicalis H5343 transformed URA prototrophs were grown at 30 °C, 170 rpm, 
in 10 ml SC-uracil media for preparation of genomic DNA. Genomic DNA was isolated by the 
method described previously. Genomic DNA was digested with Spel and Dralll. A 0.95% 
agarose gel was used to prepare a Southern hybridization blot. The DNA from the gel was 
transferred to a MagnaCharge nylon filter membrane (MSI Technologies, Westboro, MA) 
according to the alkaline transfer method of Sambrook et al., supra. For the Southern 
hybridization, a 2.2 kb CYP52A2A DNA fragment was used as a hybridization probe. 300 ng of 
CYP52A2A DNA was labeled using a ECL Direct labeling and detection system (Amersham) and 
the Southern was processed according to the ECL kit specifications. The blot was processed in a 
volume of 30 ml of hybridization fluid corresponding to 0.125 ml/cm 2 . Following a 
prehybridization at 42°C for 1 hr, 300 ng of CYP52A2A probe was added and the hybridization 
continued for 16 hr at 42 °C. Following hybridization, the blots were washed two times for 20 
min each at 42 °C in primary wash containing urea. Two 5 min secondary washes at RT were 
conducted, followed by detection according to directions. The blots were exposed for 16 hours 
(hr) as recommended. 

Integration was confirmed by the detection of a SpehDralll 3.5 kb fragment from 
the genomic DNA of the transformants but not with the C. tropicalis 20336 control. 
Subsequently, a Pad digestion of the genomic DNA of the positive transformants, followed by a 
Southern hybridization using an CYP52A2A gene probe, confirmed integration by the detection 
of a 2.2 kb fragment. The resulting CYP52A2A integrated strain was named HDC1 (see Table 1). 
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In a manner similar to the abo ve, each of the genes contained in the Pad 
fragments which are described in Section 3c above were confirmed for integration into the 
genome of C. tropicalis. 

Transformants generated by transformation with the vectors, pURAin CPR 2A S 
5 or pURAin CPR 2 A O, were analyzed by Southern hybridization for integration of both the 
CYP52A2A (SEQ ID NO: 86) and CPRB (SEQ ID NO: 82) genes tandemly. Three strains were 
generated in which the CYP52A2A (SEQ ID NO: 86) and CPRB (SEQ ID NO: 82) genes 
integrated are in the opposite orientation (HDC 20-1, HDC 20-2 and HDC 20-3) and three were 
generated with the CYP52A2A (SEQ ID NO: 86) and CPRB (SEQ ID NO: 82) genes integrated 
10 in the same orientation (HDC 23-1 , HDC 23-2 and HDC 23-3), Table 1 . 

E. Confirmation of CPRB Integration into H5343 urtr 

Seven transformants were screened by colony PCR using CPRB primer #2 (SEQ 
ID NO: 8) and a URA3A- specific primer. In five of the transformants, successful integration 
15 was detected by the presence of a 3 899 bp PCR product This 3899 bp PCR product represents 
the CPRB gene adjacent to the URA3A gene in the genome of H5343 thereby a>iifirming 
integration. The resulting CPRB integrated strains were named HDC10-1 and HDC10-2 (see 
Table 1). 

20 F. Strain Evaluation. 

As determined by quantitative PCR, when compared to parent H5343, HDC10-1 
contained three additional copies of the reductase gene and HDC 10-2 contained four additional 
copies of the reductase gene. Evaluations of HDC20-1, HDC20-2 and HDC20-3 based on 
Southern hybridization data indicates that HDC20-1 contained multiple integrations, i.e., 2 to 3 

25 times that of HDC20-2 or HDC20-3. Evaluations of HDC23-1, HDC23-2, and HDC23-3 based 
on Southern hybridization data indicates that HDC23-3 contained multiple integrations, i.e., 2 to 
3 times that of HDC23-1 or HDC23-2. The data in Table 8 indicates that the integration of 
components of the u-hydroxylase complex have a positive effect on the improvement of 
Candida tropicalis ATCC 20962 as a biocatalyst. The xesults indicate that CYP52A5A (SEQ ID 

30 NO: 90) is an important gene for the conversion of oleic acid to diacid. Surprisingly, tandem 
integrations of CYP and CPR genes oriented in the opposite direction (HDC 20 strains) seem to 



-65- 



be less productive than tandem integrations oriented in the same direction (HDC 23 strains), 
Tables 1 and 8. 

CHART 



(anhydrous) 
Distilled Water 

NZCYM Agar 
Bacto Casein Digest 
Bacto Casamino Acids 
Bacto Yeast Extract 
Sodium Chloride 



1,000 ml 



lOg 
lg 
5g 
5g 



Media Comoosition 




Magnesium Sulfate 


0.98 g 






(anhydrous) 




LB Broth 




Agar 


15 g 


Bacto Tryptone 


10 g 


Distilled Water 


1,000 ml 


Bacto Yeast Extract 


5g 


NZCYM Ton Aearose 




Sodium Chloride 


10 g 


Bacto Casein Digest 


lOg 


Distilled Water 


1,000 ml 


Bacto Casamino Acids 


lg 






Bacto Yeast Extract 


5g 


LB Aear 




Sodium Chloride 


5g 


Bacto Tryptone 


10 g 


Magnesium Sulfate 


0.98 g 


Bacto Yeast Extract 


: 5g 


(anhydrous) 




Sodium Chloride 


10g 


Agarose 


7g 


Agar 


15g 


Distilled Water 


1,000 ml 


Distilled Water 


1,000 ml 










YEPD Broth 




LB Tod Agarose 




Bacto Yeast Extract 


10g 


Bacto Tryptone 


10g 


Bacto Peptone 


20 g 


Bacto Yeast Extract 


5g 


Glucose 


20 g 


Sodium Chloride 


lOg 


Distilled Water 


1,000 ml 


Agarose 


7g 






Distilled Water 


1,000 ml 


YEPDARart 








Bacto Yeast Extract 


lOg 


NZCYM Broth 




Bacto Peptone 


20 g 


Bacto Casein Digest 


lOg 


Glucose 


20 g 


Bacto Casamino Acids 


lg 


Agar 


20 g 


Bacto Yeast Extract 


*g 


Distilled Water 


1,000 ml 


Sodium Chloride 


*g 






Magnesium Sulfate 


0.98 g 


SC- uracil* 





Bacto-ycast nitrogen base without amino acids 
Glucose 
Bacto-agar 
* Drop-otrt mix 
Distilled water 



6.7g 
20g 
20g 

2g 
1,000ml 
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PCA2 medium g/l 

Peptone 3.0 

Yeast Extract 6.0 

Sodium Acetate 3.0 

Yeast Nitrogen Base (Difco) 6.7 

Glucose (anhydrous) 50.0 

Potassium Phosphate (dibasic, trihydrate) 7.2 
Potassium Phosphate (monobasic, anhydrous) 9.3 



10 



15 



DC A3 medium g/l 
0.3 M Phosphate buffer containing, pH 7.5 

Glycerol SO 

Yeast Nitrogen base (Difco) 6.7 



20 



25 



Droo-out mix 








Adenine 


0.5g 


Alanine 


2g 


Arginine 


2g . 


Asparagine 


2g 


Aspartic acid 


2g 


Cysteine 


2g 


Glutamine 


2g 


Glutamic acid 


2g 


Glycine 


2g 


Histidine 


2g 


Inositol 


2g 


Isoleucine 


2g 


Leucine 


lOg 


Lysine 


2g 


Methionine 


2g 


para-Aminobenzoic acid 


0.2g 


Phenylalanine 


2g 


Proline 


2g 


Serine 


2g 


Threonine 


2g 


Tryptophan 


2g 


Tyrosine 


2g 


Valine 


2g 







30 



♦See Kaiser et al.. Methods in Yeast Genetics, Cold Spring Harbor Laboratory Press, USA (1994), incorporated herein by 
reference. 
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It will be understood that various modifications may be made to the embodiments 
and/or examples disclosed herein. Thus, the above description should not be construed as 
limiting, but merely as exemplifications of preferred embodiments. Those skilled in the art will 
envision other modifications within the scope and spirit of the claims appended hereto. 
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WHAT IS CL 




1. Isolated nucleic acid encoding a CPRA protein having the amino 
acid Sequence set forth in SEQ ID NO: 83. 

2. Isolated nucleic acid comprising a coding region defined by nucleotides 1006- 
3042 as set forth in SEQ ID NO: 81. 

3. Isolated nucleic acid according to claim 2 comprising the nucleotide sequence 
as set forth in SEQ ID NO: 81. 

4. Isolated protein comprising an amino acid sequence as set forth in SEQ ID NO: 

83. 

5. A vector comprising a nucleotide sequence encoding CPRA protein including 
an amino acid sequence as set forth in SEQ ID NO: 83. 

6. A vector according to claim 5 wherein the nucleotide sequence is set forth in 
nucleotides 1006-3042 of SEQ ID NO: 81 

7. A vector according to claim 5 wherein the vector is selected from the group 
consisting of plasmid, phagemid, phage and cosmid. 

8. A host cell transfected or transformed with the nucleic acid of claim 1. 

9. A host cell according to claim 8 wherein the host cell is a yeast cell. 

1 0. A host cell according to claim 9 wherein the yeast cell is a Candida sp. 

1 1 . A host cell according to claim 1 0 wherein the Candida sp. is Candida 

tropicalis. 



% 

12. A host cell according to claim 1 1 wherein the Candida tropicalis is Candida 
tropicatts 20336. 

13. A host cell according to claim 12 wherein the Candida tropicalis is H5343 

5 ura-. 

14. A method of producing a CPRA protein including an amino acid sequence as 
set forth in SEQ ID NO: 83 comprising: 

a) transforming a suitable host cell with a DNA sequence that encodes the protein 
10 having the amino acid sequence as set forth in SEQ ID NO: 83; and 

b) culturing the cell under conditions favoring the expression of the protein. 

15. The method according to claim 14 wherein the step of culturing the cell 
comprises adding ah organic substrate to media containing the cell. 

15 

16. Isolated nucleic acid encoding a CPRB protein having the amino acid 
sequence set forth in SEQ ID NO: 84. 

17. Isolated nucleic acid comprising a coding region defined by nucleotides 1033- 
20 3069 as set forth in SEQ ID NO: 82. 

1 8. Isolated nucleic acid according to claim 1 7 comprising the nucleotide 
sequence as set forth in SEQ ID NO: 82. 

25 19. Isolated protein comprising an amino acid sequence as set forth in SEQ ID 

NO: 84. 

20. A vector comprising a nucleotide sequence encoding CPRB protein including 
an amino acid sequence as set forth in SEQ ID NO: 84. 

30 

21. A vector according to claim 20 wherein the nucleotide sequence is set forth in 
nucleotides 1033-3069 of SEQ ID NO: 82. 
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10 



22. Avector according to claim 20 wherein the vector is selected from the group 
consisting of plasmid, phagemid, phage and cosmid.. 

23. A host cell transfected or transformed with the nucleic acid of claim 16. 

24. A host cell according to claim 23 wherein the host cell is a yeast cell. 

25. A host cell according to claim 24 wherein the yeast cell is a Candida sp. 

26. A host cell according to claim 25 wherein the Candida sp. is Candida 

tropicalis. 



27. A host cell according to claim 26 wherein the Candida tropicalis is Candida 
tropicalis 20336. 

15 



28. A host cell according to claim 27 wherein the Candida tropicalis is H5343 



ura-. 



29. A method of producing a CPRB protein including an amino acid sequence as 
20 set forth in SEQ ID NO: 84 comprising: 

a) traiisforming a suitable host cell with a DNA sequence that encodes the protein 
having the amino acid sequence as set forth in SEQ ID NO: 84; and 

b) culturing the cell under conditions favoring the expression of the protein. 

25 30> ^ method according to claim 29 wherein the step of culturing the cell 

comprises adding an organic substrate to media containing the cell. 

31. Isolated nucleic acid encoding a CYP52A1A protein having the amino acid 
' 1 sequence set forth in SEQ ID NO: 95. 

30 

32. Isolated nucleic acid comprising a coding region defined by nucleotides 1 177- 
2748 as set forth in SEQ ID NO: 85. 
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33. Isolated nucleic acid according to claim 32 comprising the nucleotide 
as set forth in SEQ ID NO: 85. 



35. A vector comprising a nucleotide sequence encoding CYP52A1A protein 
including an amino acid sequence as set forth in SEQ ID NO: 95. 

36. A vector according to claim 35 wherein the nucleotide sequence is set forth i 
nucleotides 1 177-2748 of SEQ ID NO: 85. 

37. A vector according to claim 35 wherein the vector is selected from the group 
consisting of plasmid, phagemid, phage and cosmid. 



34. Isolated protein comprising an amino acid sequence as set forth in SEQ ID 



5 NO: 95. 



15 



38. A host cell transfected or transformed with the nucleic acid of claim 31. 



39. A host cell according to claim 38 wherein the host cell is a yeast cell. 



20 



40. A host cell according to claim 39 wherein the yeast cell is a Candida 



41. A host cell according to claim 40 wherein the Candida sp. is Candida 



tropicalis. 



25 



42. A host cell according to claim 41 wherein the Candida tropicalis is Candida 



tropicalis 20336. 



43. A host cell according to claim 42 Wherein the Candida tropicalis is H5343 



30 ura-. 
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44. A method of producing a CYP52A1A protein including an amino acid 
sequence as set forth in SEQ ID NO: 95 comprising: 

a) transforming a suitable host cell with a DNA sequence that encodes the protein 
having the amino acid sequence as set forth in SEQ ID NO: 95; and 
5 b) culturing the cell under conditions favoring the expression of the protein. 

45. The method according to claim 44 wherein the step of culturing the cell 
comprises adding an organic substrate to media containing the cell. 

10 46. Isolated nucleic acid encoding a CYP52A2A protein having the amino acid 

sequence set forth in SEQ ID NO: 96. 

47. Isolated nucleic acid comprising a coding region defined by nucleotides 1 199- 
2767 as set forth in SEQ ID NO: 86. 

15 

48. Isolated nucleic acid according to claim 47 comprising the nucleotide 
sequence as set forth in SEQ ID NO: 86. 

49. Isolated protein comprising an amino acid sequence as set forth in SEQ ID 

20 NO: 96. 

50. A vector comprising a nucleotide sequence encoding CYP52A2A protein 
including an amino acid sequence as set forth in SEQ ID NO: 96. 

25 51. A vector according to claim 50 wherein the nucleotide sequence is set forth in 

nucleotides 1 1 99-2767 of SEQ ID NO: 86. 

52. A vector according to claim 50 wherein the vector is selected from the group 
— ... ■!: . ■ . consisting of plasmid, phagemid* phage and sosnud.--W 

30 

53. A host cell transfected or transformed with the nucleic acid of claim 46. 
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54. A host cell according to claim 53 wherein the host cell is a yeast cell 



TV 



sp. 



55. A host cell according to claim 54 wherein the yeast cell is a Candida 

5 56. A host cell according to claim 55 wherein the Candida sp. is Candida 

tropicalis. 



57. A host cell according to claim 56 wherein the Candida tropicalis is Candida 
tropicalis 20336. 

10 

58. A host cell according to claim 57 wherein the Candida tropicalis is H5343 

ura-. 

59. A method of producing a CYP52A2A protein including an amino acid 
15 sequence as set forth in SEQ ID NO: 96 comprising: 

a) transforming a suitable host cell with a DNA sequence that encodes the protein 
having the amino acid sequence as set forth in SEQ ID NO: 96; and 

b) culturing the cell under conditions favoring the expression of the protein. 

20 60 - The method according to claim 59 wherein the step of culturing the cell 

comprises adding an organic substrate to media containing the cell. 

61. Isolated nucleic acid encoding a CYP52A2B protein having the amino acid 
sequence set forth in SEQ ID NO: 97. 

25 

62. Isolated nucleic acid comprising a coding region defined by nucleotides 1072- 
2640 as set forth in SEQ ID NO: 87. 



63. Isolated nucleic acid according to claim 62 comprising the nucleotide sequence 
30 as set forth in SEQ ID NO: 87. 
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Isolated protein comprising an amino acid sequenc 



64. Isolated protein comprising an amino acid sequence as set forth in SEQ ID 
NO: 97. 

65. A vector comprising a nucleotide sequence encoding CYP52A2B protein 
5 including an amino acid sequence as set forth in SEQ ID NO: 97. 

66. A vector according to claim 65 wherein the nucleotide sequence is set forth in 
nucleotides 1072-2640 of SEQ ID NO: 87. 

10 67. A vector according to claim 65 wherein the vector is selected from the group 

consisting of plasmid, phagemid, phage and cosmid. 

68. A host cell transfected or transformed with the nucleic acid of claim 61. 
15 69. A host cell according to claim 68 wherein the host cell is a yeast cell. 

70. A host cell according to claim 69 wherein the yeast cell is a Candida sp. 

71 . A host cell according to claim 70 wherein the Candida sp. is Candida 

20 tropicalis. 

72. A host cell according to claim 71 wherein the Candida tropicalis is Candida 
tropicalis 20336. 

25 73. A host cell according to claim 72 wherein the Candida tropicalis is H5343 

ura-. 

74. A method of producing a CYP52A2B protein including an amino acid 
sequence as set forth in SEQ ID^NO; 97 comprising; - 
30 a ) transforming a suitable host cell with a DNA sequence that encodes the protein 

having the amino acid sequence as set forth in SEQ ID NO: 97; and 

b) culturing the cell under conditions favoring the expression of the protein. 
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75. The method according to claim 74 wherein the step of culturing the cell 
comprises adding an organic substrate to media containing the cell. 

76. Isolated nucleic acid encoding a CYP52A3A protein having the amino acid 
5 sequence set forth in SEQ ID NO: 98. 

77. Isolated nucleic acid comprising a coding region defined by nucleotides 1 126- 
2748 as set forth in SEQ ID NO: 88. 

10 78 - Isolated nucleic acid according to claim 77 comprising the nucleotide sequence 

as set forth in SEQ ID NO: 88. 

79. Isolated protein comprising an amino acid sequence as set forth in SEQ ID 

NO: 98. 

15 

80. A vector comprising a nucleotide sequence encoding CYP52A3A protein 
including an amino acid sequence as set forth in SEQ ID NO: 98. 

81. A vector according to claim 80 wherein the nucleotide sequence is set forth in 
20 nucleotides 1 126-2748 of SEQ ID NO: 88. 

82. A vector according to claim 80 wherein the vector is selected from the group 
consisting of plasmid, phagemid, phage and cosmid. 

25 83. A host cell transfected or transformed with the nucleic acid of claim 76. 

84. A host cell according to claim 83 wherein the host cell is a yeast cell. 

85. A host cell according to claim 84 wherein the yeast cell is a Candida sp. 

30 J ■'• - :.. 

86. A host cell according to claim 85 wherein the Candida sp. is Candida 

tropicalis. 



-76- 



t 

87. A host cell according to claim 86 wherein the Candida tropicalis is Candida 
tropicalis 20336. 

88. A host cell according to claim 87 wherein the Candida tropicalis is H5343 

5 lira-. 

89. A method of producing a CYP52A3A protein including an amino acid 
sequence as set forth in SEQ ID NO: 98 comprising: 

a) transforming a suitable host cell with a DNA sequence that encodes the protein 
10 having the amino acid sequence as set forth in SEQ ID NO: 98; and 

b) culturing the cell under conditions favoring the expression of the protein. 

90. The method according to claim 89 wherein the step of culturing the cell 
comprises adding an "organic substrate to media containing the cell. 

15 

91 . Isolated nucleic acid encoding a CYP52A3B protein having the amino acid 
sequence as set forth in SEQ ID NO: 99. 



92. Isolated nucleic acid comprising a coding region defined by nucleotides 913- 
20 2535 as set forth in SEQ ID NO: 89. 

93. Isolated nucleic acid according to claim 92 comprising the nucleotide 
sequence as set forth in SEQ ID NO: 89. 

2 5 94. Isolated protein comprising an amino acid sequence as set forth in SEQ ID 

NO: 99. 



95. A vector comprising a nucleotide sequence encoding CYP52A3B protein 
- including an amino acid sequence as set forth in SEQ ID NO: 99. 

96. A vector according to claim 95 wherein the nucleotide sequence is set forth i 
nucleotides 913-2535 of SEQ ID NO: 89. 



-77- 



t * 

97. A vector according to claim 95 wherein the vector is selected from the group 
consisting of plasmid, phagemid, phage and cosmid. 

98. A host cell transfected or transformed with the nucleic acid of claim 91 . 

5 

99. A host cell according to claim 98 wherein the host cell is a yeast cell. 

1 00. A host cell according to claim 99 wherein the yeast cell is a Candida sp. 

10 101 - A host cell according to claim 100 wherein the Candida sp. {^Candida 

tropicalis. 

102. A host cell according to claim 101 wherein the Candida tropicalis is 
Candida tropicalis 20336. 

15 

103. A host cell according to claim 102 wherein the Candida tropicalis is H5343 

ura-. 

104. A method of producing a CYP52A3B protein including an amino acid 
20 sequence as set forth in SEQ ID NO: 99 comprising: 

a) transforming a suitable host cell with a DNA sequence that encodes the protein 
having the amino acid sequence as set forth in SEQ ID NO: 99; and 

b) culturing the cell under conditions favoring the expression of the protein. 

25 1 ° 5 - 7,16 method according to claim 1 04 wherein the step of culturing the cell 

comprises adding an organic substrate to media containing the cell. 

106. Isolated nucleic acid encoding a CYP52A5A protein having the amino acid 
sequence set forth in SEQ ID NO: 1 00. 

30 

1 07. Isolated nucleic acid comprising a coding region defined by nucleotides 
1 1 03-2656 as set forth in SEQ ID NO: 90. 
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. Isolated nucleic acid according to Haim 107 



108. iSolated nucleic acid according to claim 107 comprising the nucleotide 
sequence as set forth in SEQ ID NO: 90. 



1 09. Isolated protein comprising an amino acid sequence as set forth in SEQ ID 

5 NO: 100. 



1 10. A vector comprising a nucleotide sequence encoding CYP52A5A protein 
including an amino acid sequence as set forth in SEQ ID NO: 100. 

10 1 1 L A vector according to claim 1 10 wherein the nucleotide sequence is set forth 

in nucleotides 1 103-2656 OF SEQ ID NO: 90. 

1 12. A vector according to claim 1 10 wherein the vector is selected from the 
group consisting of plasmid, phagemid, phage and cosmid. 

15 

1 1 3. A host cell transfected or transformed with the nucleic acid of claim 1 06. 

1 14. A host cell according to claim 1 13 wherein the host cell is a yeast cell. 

1 15. A host cell according to claim 1 14 wherein the yeast cell is a Candida sp. 

1 16. A host cell according to claim 115 wherein the Candida sp. is Candida 

tropicalis. 

1 17. A host cell according to claim 1 16 wherein the Candida tropicalis is 
Candida tropicalis 20336. 

1 18. A host cell according to claim 1 17 wherein the Candida tropicalis is H5343 

• ura-. 

1 1 9. A method of producing a CYP52A5A protein including an amino acid 
sequence as set forth in SEQ ID NO: 100 comprising: 



-79- 



9^ 9) 

a) transforming a suitable host cell with a DNA sequence that encodes the protein 
having the amino acid sequence as set forth in SEQ ID NO: 100; and 

b) culturing the cell under conditions favoring the expression of the protein. 

5 120. The method according to claim 1 19 wherein the step of culturing the cell 

comprises adding an organic substrate to media containing the cell. 

121. Isolated nucleic acid encoding a CYP52ASB protein having the amino acid 
sequence as set forth in SEQ ID NO: 101 . 

10 

122. Isolated nucleic acid comprising a coding region defined by nucleotides 
1 142-2695 as set forth in SEQ ID NO: 91 . 

123. Isolated nucleic acid according to claim 122 comprising the nucleotide 
15 sequence as set forth in SEQ ID NO: 9 1 . 



124. Isolated protein comprising an amino acid sequence as set forth in SEQ ID 

NO: 101. 



20 125 ' A vector comprising a nucleotide sequence encoding C}T52^55 protein 

including the amino acid sequence as set forth in SEQ ID NO: 101. 

126. A vector according to claim 125 wherein the nucleotide sequence is set forth 
in nucleotides 1 142-2695 of SEQ ID NO: 91 . 

25 

127. A vector according to claim 125 wherein the vector is selected from the 
group consisting of plasmid, phagemid, phage and cosmid. 

128. A host cell transfected or transformed with the nucleic acid of claim 121 

30 

1 29. A host cell according to claim 128 wherein the host cell is a yeast cell. 
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130. A host cell according to claim 129 wherein the yeast cell is a Candida sp. 

1 3 1 . A host cell according to claim 1 30 wherein the Candida sp. is Candida 

tropicalis. 

5 

132. A host cell according to claim 131 wherein the Candida tropicalis is 
Candida tropicalis 20336. 

133. A host cell according to claim 132 wherein the Candida tropicalis is H5343 

10 ura-. 

134. A method of producing a CYP52A5B protein including an amino acid 
sequence as set forth in SEQ ID NO: 101 comprising: 

a) transforming a suitable host cell with a DNA sequence that encodes the protein 
15 having the amino acid sequence as set forth in SEQ ID NO: 101 ; and 

b) culturing the cell under conditions favoring the expression of the protein. 

135. The method according to claim 134 wherein the step of culturing the cell 
comprises adding an organic substrate to media containing the cell. 

20 

136. Isolated nucleic acid encoding a CYP52A8A protein having the amino acid 
sequence set forth in SEQ ID NO: 1 02. 

137. Isolated nucleic acid comprising a coding region defined by nucleotides 464- 
25 2002 as set forth in SEQ ID NO: 92. 

138. Isolated nucleic acid according to claim 137 comprising the nucleotide 
sequence as set forth in SEQ ID NO: 92. 



30 139. Isolated protein comprising an amino acid sequence as set forth in SEQ ID 

NO: 102. 
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140. A vector comprising a nucleotide sequence encoding CYP52A8A protein 
including an amino acid sequence as set forth in SEQ ID NO: 102. 

141. A vector according to claim 140 wherein the nucleotide sequence is set forth 
5 in nucleotides 464-2002 of SEQ ID NO: 92. 

142. A vector according to claim 140 wherein the vector is selected from the 
group consisting of plasmid, phagemid, phage and cosmid. 



10 



15 



tropicalis. 



143. A host cell transfected or transformed with the nucleic acid of claim 136. 

144. A host cell according to claim 143 wherein the host cell is a yeast cell. 

145. ■ A host cell according to claim 144 wherein the yeast cell is a Candida sp. 

146. A host cell according to claim 145 wherein the Candida sp. is Candida 



147. A host cell according to claim 146 wherein the Candida tropicalis is 
20 Candida tropicalis 20336. 



148. A host cell according to claim 147 wherein the Candida tropicalis is H5343 



ura-. 



25 149 - A method of Producing a CYP52A8A protein including an amino acid 

sequence as set forth in SEQ ID NO: 102 comprising: 

a) transforming a suitable host cell with a DNA sequence that encodes the protein 
having the amino acid sequence as set forth in SEQ ID NO: 1 02; and 

b) culturing the cell under conditions favoring the expression of the protein. 



30 



1 50. The method according to claim 1 49 wherein the step of culturing the cell 
comprises adding an organic substrate to media containing the cell. 
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Isolated nucleic acid encodine a CYPSlARTtnrku 



151. Isolated nucleic acid encoding a CYP52A8B protein having the amino acid 
sequence set forth in SEQ ID NO: 103. 

152. Isolated nucleic acid comprising a coding region defined by nucleotides 
1017-2555 as set forth in SEQ ID NO: 93. 

153. Isolated nucleic acid according to claim 152 comprising the nucleotide 
sequence as set forth in SEQ ID NO: 93. 



154. Isolated protein comprising an amino acid sequence as set forth in SEQ ID 

NO: 103. 



155. A vector comprising a nucleotide sequence encoding CYP52A8B protein 
including an amino acid sequence as set forth in SEQ ID NO: 103. 

156. A vector according to claim 155 wherein the nucleotide sequence is set forth 
in nucleotides 1017-2555 of SEQ ID NO: 93. 

157. A vector according to claim 155 wherein the vector is selected from the 
group consisting of plasmid, phagemid, phage and cosmid. 

1 58. A host cell transfected or transformed with the nucleic acid of claim 151. 

159. A host cell according to claim 158 wherein the host cell is a yeast cell. 

160. A host cell according to claim 159 wherein the yeast cell is a Candida sp. 

161 . A host cell according to claim 160 wherein the Candida sp. is Candida 
tropicalis.- - ■ 



162. A host cell according to claim 161 wherein the Candida tropicalis is 
Candida tropicalis 20336. 
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1 63. A host cell according to claim 1 62 wherein the Candida tropicalis is H5343 



ura-. 



164. A method of producing a CYP52A8B protein including an amino acid 
5 sequence as set forth in SEQ ID NO: 103 comprising: 

a) rxansforming a suitable host cell with a DNA sequence that encodes the protein 
having the amino acid sequence as set forth in SEQ ID NO: 103; and 

b) culruring the cell under conditions favoring the expression of the protein. 

10 165 ' 7116 method according to claim 164 wherein the step of culruring the cell 

comprises adding an organic substrate to media containing the cell. 

166. Isolated nucleic acid encoding a CYP52D4A protein having the amino acid 
sequence set forth in SEQ ID NO: 104. 

15 

167. Isolated nucleic acid comprising a coding region defined by nucleotides 767- 
2266 as set forth in SEQ ID NO: 94. 

168. Isolated nucleic acid according to claim 167 comprising the nucleotide 
20 sequence as set forth in SEQ ID NO: 94. 

169. Isolated protein comprising an amino acid sequence as set forth in SEQ ID 

NO: 104. 

25 170. A vector comprising a nucleotide sequence encoding CYP52D4A protein 

including an amino acid sequence as set forth in SEQ ID NO: 104. 

1 71 . A vector according to claim 1 70 wherein the nucleotide sequence is set forth 
in nucleotides 767-2266 of SEQ ID NO: 94. 

30 

1 72. A vector according to claim 1 70 wherein the vector is selected from the 
group consisting of plasmid, phagemid, phage and cosmid. 
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1 73. A host cell transfected or transformed with the nucleic acid of claim 1 66. 

1 74. A host cell according to claim 173 wherein the host cell is a yeast cell. 

5 1 75. A host cell according to claim 1 74 wherein the yeast cell is a Candida sp. 

1 76. A host cell according to claim 1 75 wherein the Candida sp. is Candida 

tropicalis. 

10 1 77. A host cell according to claim 1 76 wherein the Candida tropicalis is 

Candida tropicalis 20336. 

178. A host cell according to claim 177 wherein the Candida tropicalis is H5343 

ura-. 

15 

179. A method of producing a CYP52D4A protein including an amino acid 
sequence as set forth in SEQ ID NO: 104 comprising: 

a) transforming a suitable host cell with a DNA sequence that encodes the protein 
having the amino acid sequence as set forth in SEQ ID NO: 104; and 
20 b) culturing the cell under conditions favoring the expression of the protein. 

1 80. The method according to claim 1 79 wherein the step of culturing the cell 
comprises adding an organic substrate to media containing the cell. 

25 1 81 . A method for discriminating members of a gene family by quantifying the 

amount of target mRNA in a sample comprising: 

a) providing an organism containing a target gene; 

b) culturing the organism with an organic substrate which causes 
upregulation in the activity of the target gene; 

30 c) obtaining a sample of total RNA from the organism at a first point in 

time; 
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d) combining at least a portion of the sample of the total RNA with i 
known amount of competitor RNA to form an RNA mixture, wherein the competitor RNA i 
substantially similar to the target mRNA but has a lesser number of nucleotides compared to the 
target mRNA; 

e) adding reverse transcriptase to the RNA mixture in a quantity sufficient 
to form corresponding target DNA and competitor DNA; 

f) conducting a polymerase chain reaction in the presence of at least one 
primer specific for at least one substantially non-homologous region of the target DNA within the 
gene family, the primer also specific for the competitor DNA; 

g) repeating steps (c-f) using increasing amounts of the competitor RNA 
while maintaining a substantially constant amount of target RNA; 

(h) determining the point at which the amount of target DNA is 
substantially equal to the amount of competitor DNA; 

(i) quantifying the results by comparing the ratio of the concentration of 
unknown target to. the known concentration of competitor; and 

0) obtaining a sample of total RNA from the organism at another point in 
time and repeating steps (d-i). 

1 82. A method according to claim 1 8 1 wherein the target gene is selected from 
the group consisting of a CPR gene and a CYP gene. 

1 83. A method according to claim 182 wherein the CPR gene is selected from the 
group consisting of a CPRA gene (SEQ ID NO: 81) and a CPRB gene (SEQ ID NO: 82). 

1 84. A method according to claim 182 wherein the CYP gene is selected from the 
group consisting of CYP52A1A gene (SEQ ID NO: 85), CYP52A2A gene (SEQ ID NO: 86), 
CYP52A2B gene (SEQ ID NO: 87), CYP52A3A gene (SEQ ID NO: 88), CYP52A3B gene (SEQ 
ID NO. 89), CYP52A5A gene (SEQ ID NO: 90), CYP52A5B gene (SEQ ID NO: 91), CYP52A8A 
gene (SEQ ID NO: 92), CYP52A8B gene (SEQ ID NO: 93) and CYP52D4A gene (SEQ ID NO: 
94). 
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1 85. A method for increasing production of a dicarboxylic acid comprising: 

a) providing a host cell having a naturally occurring number of CPRA genes; 

b) increasing, in the host cell, the number of CPRA genes which encode a CPRA 
protein having the amino acid sequence as set forth in SEQ ID NO: 83; 

5 c) culturing the host cell in media containing an organic substrate which 

upregulates the CPRA gene, to effect increased production of dicarboxylic acid. 

1 86. A method for increasing the production of a CPRA protein having an amino 
acid sequence as set forth in SEQ ID NO: 83 comprising: 

10 a) transforming a host cell having a naturally occurring amount of CPRA protein 

with an increased copy number of a CPRA gene that encodes the CPRA protein having the amino 
acid sequence as set forth in SEQ ID NO: 83; and 

b) culturing the cell and thereby increasing expression of the protein compared 
with that of a host cell containing a naturally occurring copy number of the CPRA gene. 

15 

187. A method for increasing production of a dicarboxylic acid comprising: 

a) providing a host cell having a naturally occurring number of CPRB genes; 

b) increasing, in the host cell, the number of CPRB genes which encode a CPRB 
protein having the amino acid sequence as set forth in SEQ ID NO: 84; 

20 c) culturing the host cell in media containing an organic substrate which 

upregulates the CPRB gene, to effect increased production of dicarboxylic acid. 

1 88. A method for increasing the production of a CPRB protein having an amino 
acid sequence as set forth in SEQ ID NO: 84 comprising: 

25 a) transforming a host cell having a naturally occurring amount of CPRB protein 

with an increased copy number of a CPRB gene that encodes the CPRB protein having the amino 
acid sequence as set forth in SEQ ID NO: 84; and 

b) culturing the cell and thereby increasing expression of the protein compared 
with that of a host cell containing a natu^ly tuning copy number of the CPRB gene. 



30 



189. A method for increasing production of a dicarboxylic acid comprising: 

a) providing a host cell having a naturally occurring number of CYP52A1 A genes; 
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b) increasing, in the host cell, the number of CYP52A1 A genes which encode a 
CYP52A1A protein having the amino acid sequence as set forth in SEQ ID NO: 95; 

c) culturing the host cell in media containing an organic substrate which 
upregulates the CYP52A2A gene, to effect increased production of dicarboxylic acid. 

190. A method for increasing the production of a CYP52A1A protein having an 
amino acid sequence as set forth in SEQ ID NO: 95 comprising: 

a) transforming a host cell having a naturally occurring amount of CYP52A1 A 
protein with an increased copy number of a CYP52A1A gene that encodes the CYPS2A1A protein 
having the amino acid sequence as set forth in SEQ ID NO: 95; and 

b) culturing the cell and thereby increasing expression of the protein compared 
with that of a host cell containing a naturally occurring copy number of the CYP52A1A gene. 

191. A method for increasing production of a dicarboxylic acid comprising: 

a) providing a host cell having a naturally occurring number of CYP52A2A genes; 

b) increasing, in the host cell, the number of CYP52A2A genes which encode a 
CYP52A2A protein having the amino acid sequence as set forth in SEQ ID NO: 96; 

c) culturing the host cell in media containing an organic substrate which 
upregulates the CYP52A2A gene, to effect increased production of dicarboxylic acid. 

1 92. A method for increasing the production of a CYP52A2A protein having an 
amino acid sequence as set forth in SEQ ID NO: 96 comprising: 

a) transforming a host cell having a naturally occurring amount of CYP52A2 A 
protein with an increased copy number of a CYP52A2A gene that encodes the CYP52A2A protein 
having the amino acid sequence as set forth in SEQ ID NO: 96; and 

b) culturing the cell and thereby increasing expression of the protein compared 
with that of a host cell containing a naturally occurring copy number of the CYP52A2A gene. 

1 193. A method for increasing production of a dicarboxylic acid comprising: 

a) providing a host cell having a naturally occurring number of CYP52A2B genes; 

b) increasing, in the host cell, the number of CYP52A2B genes which encode a 
CYP52A2B protein having the amino acid sequence as set forth in SEQ ID NO: 97; 
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c) cultunng the host cell in media containing an organic substrate which 
upregulates the CYP52A2B gene, to effect increased production of dicarboxylic acid. 

194. A method for increasing the production of a CYP52A2B protein having an 
amino acid sequence as set forth in SEQ ID NO: 97 comprising: 

a) transforming a host cell having a naturally occurring amount of CYP52A2B 
protein with an increased copy number of a CYP52A2B gene that encodes the CYP52A2B protein 
having the amino acid sequence as set forth in SEQ ID NO: 97; and 

b) cultunng the cell and thereby increasing expression of the protein compared 
with that of a host cell containing a naturally occurring copy number of the CYP52A2B gene. 

195. A method for increasing production of a dicarboxylic acid comprising: 

a) providing a host cell having a naturally occurring number of CYP52A3A genes; 

b) increasing, in the host cell, the number of CYP52A3A genes which encode a 
CYP52A3A protein having the amino acid sequence as set forth in SEQ ID NO: 98; 

c) cultunng the host cell in media containing an organic substrate which 
upregulates the CYP52A3A gene, to effect increased production of dicarboxylic acid. 

196. A method for increasing the production of a CYP52A3A protein having an 
amino acid sequence as set forth in SEQ ID NO: 98 comprising: 

a) transforming a host cell having a naturally occiUTing amount of CYP52A3 A 
protein with an increased copy number of a CYP52A3A gene that encodes the CYP52A3A protein 
having the amino acid sequence as set forth in SEQ ID NO: 98; and 

b) culturing the cell and thereby increasing expression of the protein compared 
with that of a host cell containing a naturally occurring copy number of the CYP52A3A gene. 

197. A method for increasing production of a dicarboxylic acid comprising: 

a) providing a host cell having a naturally occurring number of CYP52A3B genes; 
— b) increasing, in the host.cell, the number of CYP52A3B genes which encode a 

CYP52A3B protein having the amino acid sequence as set forth in SEQ ID NO: 99; 

c) culturing the host cell in media containing an organic substrate which 
upregulates the CYP52A3B gene, to effect increased production of dicarboxylic acid. 
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1 98. A method for increasing the production of a CYP52A3B protein having an 
amino acid sequence as set forth in SEQ ID NO: 99 comprising: 

a) transforming a host cell having a naturally occurring amount of CYP52A3B 
protein with an increased copy number of a CYP52A3B gene that encodes the CYP52A3B protein 
having the amino acid sequence as set forth in SEQ ID NO: 99; and 

b) culturing the cell and thereby increasing expression of the protein compared 
with that of a host cell containing a naturally occurring copy number of the CYP52A3B gene. 

1 99. A method for increasing production of a dicarboxylic acid comprising: 

a) providing a host cell having a naturally occurring number of CYP52 'A5A genes; 

b) increasing, in the host cell, the number of CYP52A5A genes which encode a 
CYP52A5A protein having the amino acid sequence as set forth in SEQ ID NO: 100; 

c) culturing the host cell in media containing an organic substrate which 
upregulates the CYP52A5A gene, to effect increased production of dicarboxylic acid. 

200. A method for increasing the production of a CYP52A5A protein having an 
amino acid sequence as set forth in SEQ ID NO: 1 00 comprising: 

a) transforming a host cell having a naturally occurring amount of CYP52A5 'A 
protein with an increased copy number of a CYP52A5A gene that encodes the CYP52A5A protein 
having the amino acid sequence as set forth in SEQ ID NO: 1 00; and 

b) culturing the cell and thereby increasing expression of the protein compared 
with that of a host cell containing a naturally occurring copy number of the CYP52ASA gene. 

201 . A method for increasing production of a dicarboxylic acid comprising: 

a) providing a host cell having a naturally occurring number of CYP52A5B genes; 

b) increasing, in the host cell, the number of CYP52A5B genes which encode a 
CYP52A5B protein having the amino acid sequence as set forth in SEQ ID NO: 101; 

c) culturing the host cell in media containing an organic substrate which 
upregulates the CYP52ASB gene, to effect increased production of dicarboxylic acid. 

202. A method for increasing the production of a CYP52A5B protein having an 
amino acid sequence as set forth in SEQ ID NO: 101 comprising: 
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a) transforming a host cell having a naturally occurring amount f CYP S2A5 B 
protein with an increased copy number of a CYP52A5B gene that encodes the CYP52A5B protein 
having the amino acid sequence as set forth in SEQ ID NO: 101; and 

b) culturing the cell and thereby increasing expression of the protein compared 
with that of a host cell containing a naturally occurring copy number of the CYP52A5B gene. 

203. A method for increasing production of a dicarboxylic acid comprising: 

a) providing a host cell having a naturally occurring number of CYPS2A8A genes; 

b) increasing, in the host cell, the number of CYP52A8A genes which encode a 
CYP52A8A protein having the amino acid sequence as set forth in SEQ ID NO: 102; 

c) culturing the host cell in media containing an organic substrate which 
upregulates the CYP52A8A gene, to effect increased production of dicarboxylic acid. 

204. A method for increasing the production of a CYP52A8A protein having an 
amino acid sequence as set forth in SEQ ID NO: 102 comprising: 

a) transforming a host cell having a naturally occurring amount of CYP52A8A 
protein with an increased copy number of a CYP52A8A gene that encodes the CYP52A8A protein 
having the amino acid sequence as set forth in SEQ ID NO: 1 02; and 

b) culturing the cell and thereby increasing expression of the protein compared 
with that of a host cell containing a naturally occurring copy number of the CYP52A8A gene. 

205. A method for increasing production of a dicarboxylic acid comprising: 

a) providing a host cell having a naturally occurring number of CYP52A8B genes; 

b) increasing, in the host cell, the number of CYP52A8B genes which encode a 
CYP52A8B protein having the amino acid sequence as set forth in SEQ ID NO: 103; 

c) culturing the host cell in media containing an organic substrate which 
upregulates the CYP52A8B gene, to effect increased production of dicarboxylic acid. 

206. A method for increasing the production of a CYP52A8B protein having an 
amino acid sequence as set forth in SEQ ID NO: 103 comprising: 
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a) transforming a host cell having a naturally occurring amount of CYP52A8B 
protein with an increased copy number of a CYP52A8B gene that encodes the CYP52A8B protein 
having the amino acid sequence as set forth in SEQ ID NO: 1 03; and 

b) culturing the cell and thereby increasing expression of the protein compared 
5 with that of a host cell containing a naturally occurring copy number of the CYP52A8B gene. 

207. A method for increasing production of a dicarboxylic acid comprising: 

a) providing a host cell having a naturally occurring number of CYP52D4A genes; 

b) increasing, in the host cell, the number of CYP52D4A genes which encode a 
10 CYP52D4A protein having the amino acid sequence as set forth in SEQ ID NO: 1 04; 

c) culturing the host cell in media containing an organic substrate which 
upregulates the CYP52D4A gene, to effect increased production of dicarboxylic acid. 

208. A method for increasing the production of a CYP52D4A protein having an 
15 amino acid sequence as set forth in SEQ ID NO: 104 comprising: 

a) transforming a host cell having a naturally occurring amount of CYP52D4A 
protein with an increased copy number of a CYP52D4A gene that encodes the CYP52D4A protein 
having the amino acid sequence as set forth in SEQ ID NO: 104; and 

b) culturing the cell and thereby increasing expression of the protein compared 
20 with that of a host cell obtaining a naturally occurring copy number of the CYP52D4A gene. 

209. A method for discriminating members of a gene family according to claim 
181 wherein culturing the organism with the organic substrate is accomplished in a fermentor. 
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ABSTRACT 

Novel genes have been isolated which encode cytochrome P450 and NADPH reductase 
5 enzymes of the o-hydroxylase complex of C tropicalis 20336. Vectors including these genes, 
transfected host cells and transformed host cells are provided. Methods of producing of 
cytochrome P450 and NADPH reductase enzymes are also provided which involve transforming 
a host cell with a gene encoding these enzymes and culturing the cells. Methods of increasing the 
production of a dicarboxylic acid and methods of increasing production of the aforementioned 
10 enzymes are also provided which involve increasing in the host cell the number of genes 

encoding these enzymes. A method for discriminating members of a gene family by quantifying 
the expression of genes is also provided. 
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